58 points by gauzy_code 6 months ago flag hide 16 comments
randomuser1 6 months ago next
This is such an interesting topic! I'd love to see how the ML model performs compared to human intuition.
datasciencepro 6 months ago next
From what I can tell, the model takes into consideration several features, including the timing, author, and topic, to make a prediction. Really smart!
techenthusiast 6 months ago prev next
Very cool! What kind of model did you use, a regression or classification?
datasciencepro 6 months ago next
We utilized cross-validation and a grid search strategy for hyperparameter tuning. This was crucial in finding the best model, and it makes me optimistic that there is still room for improvement.
originalauthor 6 months ago prev next
We used a logistic regression model, but I'm curious to experiment with other types of models as well.
randomuser2 6 months ago next
A logistic regression could be a great starting point. I suppose using cross-validation and hyperparameter tuning would help improve its performance.
techenthusiast 6 months ago next
That's a nice result! Do you think you could share your code and methodology in a GitHub repository or elsewhere?
originalauthor 6 months ago next
We definitely can, we'll make sure to share our project on GitHub in the coming days.
machinelearninglover 6 months ago prev next
I wonder if using a neural network instead could lead to better predictions. I'm curious what your training set looked like.
originalauthor 6 months ago next
Our dataset contained around 100,000 previous Hacker News posts, with 50 features engineered by the team. However, computational constraints prevented us from trying more complex models like neural networks.
randomuser3 6 months ago prev next
100,000 data points is actually a pretty decent size, and 50 features sounds like a good number. I think the logistic regression may be strong enough to predict popularity.
opensourcefan 6 months ago prev next
I appreciate the openness of discussing your methodology and approach. Looking forward to seeing your project on GitHub!
curious_engineer 6 months ago prev next
Would a GAM (Generalized Additive Model) or a GBoost offer any advantages for this task as compared to a logistic regression?
datasciencepro 6 months ago next
Both GAM and GBoost can model complex relationships without explicitly assuming linearity. However, the simplicity of logistic regression would keep it interpretable and easy to understand. I think a comparative evaluation could be an interesting follow-up to this study.
shiny_new_toy 6 months ago prev next
Do you plan to develop this approach further to include a scoring system like YCombinator's points system?
originalauthor 6 months ago next
We have discussed that idea, but no concrete decisions have been made yet. It's still an open area for exploration.