123 points by jcodes 6 months ago flag hide 26 comments
finance_whiz123 6 months ago next
Incredible work! Could you share more about how you achieved such high accuracy?
ml_modeler456 6 months ago next
Sure! I used a combination of historical stock data and news articles. I trained the model using a 70% 10-fold cross validation method. Here's a link to my GitHub for those interested.
ml_modeler456 6 months ago next
Yes, I mitigated overfitting by tuning hyperparameters, using regularization techniques, and taking advantage of LSTM's properties. I also accounted for random fluctuations in stock prices by utilizing a sliding window and forecasting 5 days at a time.
skeptic_tech789 6 months ago next
Thanks for the detailed response. How do you envision your model being used in practice and by whom? Traders, or perhaps large institutional investors?
ml_modeler456 6 months ago next
@skeptic_tech789- both individuals and organizations could potentially benefit. I believe it could be useful for short-term traders, day traders, and those making frequent trades for a living.
finance_whiz123 6 months ago next
@ml_modeler456 While I understand the appeal to day traders, are you worried about the potential consequences of your model being used by inexperienced traders who may not understand the underlying assumptions and risks of this approach?
skeptic_tech789 6 months ago next
^agreed! It's crucial to address the responsibility that comes with such work. How do you plan to proceed?
ml_modeler456 6 months ago next
@finance_whiz123 @skeptic_tech789- Thank you for raising an important issue. I'm considering publishing documentation and guidelines about the model's assumptions, limitations and potential misuse. It's crucial to prevent misconceptions and potential harm.
trader_1000 6 months ago next
@ml_modeler456 - I'm curious about the time frames for the input features and the output. Could you elaborate on the window size of historical stock prices and associated news articles for training and predicting?
ml_modeler456 6 months ago next
@trader_1000 - I trained the model on 90 days of historical data and used the most recent 10 days for the sliding window. The input features consist of the historical stock prices, trading volumes, and related news articles while the output is the closing stock price for the next day.
skeptic_tech789 6 months ago prev next
95% accuracy seems too good to be true. Did you consider the impact of overfitting, and how did you account for random fluctuations in stock prices?
finance_whiz123 6 months ago next
Great explanation, @ml_modeler456. To clarify, your model predicts the closing price, or are you also forecasting intraday time points?
ml_modeler456 6 months ago next
@finance_whiz123- I'm currently forecasting the closing price based on historical data and news articles from previous days.
quant_analyst01 6 months ago next
@ml_modeler456 - What preprocessing techniques did you use for your historical data and which news source did you choose? LSTM is sensitive to data presentation. I'm curious about your data preparation approach.
ml_modeler456 6 months ago next
@quant_analyst01 - I used financial data provided by Yahoo Finance and I normalized the inputs using min-max scaling. The news articles came from a combination of sources such as Yahoo Finance, Reuters, Bloomberg, and Financial Times. I cleaned the text using regular expressions to remove any unnecessary formatting, then tokenized the text and performed padding.
newbie_trader 6 months ago prev next
I'm new to the quantitative aspect of trading. Can someone explain the concept of sliding window in simpler terms and its significance for this model?
helpful_neighbor 6 months ago next
Of course! In this case, a sliding window is a technique where the most recent 10 days of data is used as input to predict the closing price for the next day. This 10-day data window then slides forward one day at a time to generate predictions for subsequent days.
curious_newbie 6 months ago next
@helpful_neighbor - What would be the output for the first 9 days, since there isn't enough historical data to match the sliding window?
helpful_neighbor 6 months ago next
@curious_newbie - Good question! For those first 9 days, you would use a smaller window (e.g. 1 day or 2 days) with a corresponding output or use synthetic data generation to create more training data.
newbie_trader 6 months ago next
@helpful_neighbor - Thanks for the explanation. I think I have a clearer picture now!
market_watcher 6 months ago prev next
Have you benchmarked your model's performance against other commonly used stock prediction models like linear regression, decision trees, and random forests? If so, what were the results?
ml_modeler456 6 months ago next
@market_watcher - Yes, I reviewed performance against linear regression, decision trees, and random forests. The LSTM outperformed those models by a significant margin in terms of accuracy and generalization.
model_comparer7 6 months ago next
Intriguing to see LSTM outperform the other models. Can you share details about the evaluation metrics used and the precise percentages for accuracy and generalization?
ml_modeler456 6 months ago next
@model_comparer7 - I primarily used MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and the explained variance plot (R^2). The LSTM model had a MAE of 2.5%, an RMSE of 2.9%, and an R^2 of 0.972, outperforming the other models in all categories.
startup_founder 6 months ago prev next
Congratulations on the achievement! Have you thought about starting a company that focuses on this technology for the finance industry?
ml_modeler456 6 months ago next
@startup_founder - I appreciate your kind words. I've considered the idea but haven't taken any significant steps so far. I will contemplate it further and perhaps engage in some conversations with the finance community.