89 points by codewithease 1 year ago flag hide 25 comments
gnawhoy 1 year ago next
Great work! I've been looking for a newsfeed aggregator that can adapt to my interests. Looking forward to trying this out!
carefulthinker 1 year ago next
Do you have any tutorial for setting up the machine learning algorithms? I'm interested in learning more about it.
gnawhoy 1 year ago prev next
Not at the moment, but I'll add it to my to-do list. The algorithms used are based on supervised learning - if that helps you search for a starting point.
ada_lovelace 1 year ago prev next
Interesting concept! What type of news do you aggregate - mainly tech or broader categories?
gnawhoy 1 year ago next
It includes a wide range of news sources (over 100), so the news can be quite diverse - but it is still tech-focused. Categories include web development, CS research, data science, VR, AI, and more.
pam_developer 1 year ago prev next
Thank you for sharing! Would love to see how it selects articles specific to my interests. Could you elaborate on the input you provide to the algorithm for personalization?
gnawhoy 1 year ago next
Sure! It looks at a user's reading habits (title and content reads, time spent on each article), as well as pre-selected interests. It uses this information to assign a score to each article, with higher scores attributed to articles more likely to be in line with the user's interests.
alice_the_encoder 1 year ago prev next
Very curious as to what pre-selected interests would help you match better. Is there a list of available options or perhaps a means of inputting a custom keyword?
gnawhoy 1 year ago next
Currently, we offer predefined options, but we are considering expanding to include custom keywords. The current interests include several subtopics within AI (deep learning, NLP, computer vision, etc.), web development paradigms (JavaScript, React, Vue, etc.), and more.
crypt_cat 1 year ago prev next
Love the project, any plans on open-sourcing it?
gnawhoy 1 year ago next
It's something we might consider in the future, but for the time being, it's a closed-source project.
grace_coder19 1 year ago prev next
I'm impressed - do you have any evaluations or documentation of the algorithm's performance?
gnawhoy 1 year ago next
Thanks for the compliment! Yes, we evaluated the solution against a smaller (<100 users) random sample with a 0.82 precision score and ~0.75 recall. Documentation will be released along with a demo video soon.
for_looper3000 1 year ago prev next
How long did it take to build?
gnawhoy 1 year ago next
The development took around three months, splitting time between the news aggregator interface, and the underlying machine learning algorithms that fuel the personalization.
future_ml_guru 1 year ago prev next
I've seen similar projects but not with the same level of sophistication. Great innovation! Do you have an estimate of the server costs for such an application?
gnawhoy 1 year ago next
Our estimate ranges from $100 to $150 per month, depending on usage spikes - this covers running costs for the cloud hosting (servers, storage, and bandwidth), as well as periodic ML model retraining.
bob_webdev 1 year ago prev next
Have you thought about implementing further aspects of AI like a chatbot to recommend new topics based on users' conversations?
gnawhoy 1 year ago next
That is definitely an interesting concept we'll keep in mind as we iterate and develop on this project. Thank you for the input!
dr_algo 1 year ago prev next
Great idea! I'd be curious to learn more about the application architecture and data flow - particularly the features that capture user interests.
gnawhoy 1 year ago next
There is a two-fold approach - user profiling based on reading habits and selection of article categories based on user preferences. The implementation is based on a combination of Python (for ML), Django (web framework), and PostgreSQL (database).
rob_quant 1 year ago prev next
How do you label the data to train the algorithm - are you using data annotation techniques?
gnawhoy 1 year ago next
That's an excellent question. To train the model initially, we utilized semi-supervised learning, employing distant supervision and rule-based heuristics to generate weak labels. These labels were then manually corrected to create high-quality training data.
rand_user123 1 year ago prev next
Very interested in this - good luck with your project and I expect to see more from you in the future!
gnawhoy 1 year ago prev next
Thank you, we're eager to see how this project resonates with the community and continuously improve upon it. Excited to be sharing our work with everyone!