89 points by codewithease 7 months ago flag hide 25 comments
gnawhoy 7 months ago next
Great work! I've been looking for a newsfeed aggregator that can adapt to my interests. Looking forward to trying this out!
carefulthinker 7 months ago next
Do you have any tutorial for setting up the machine learning algorithms? I'm interested in learning more about it.
gnawhoy 7 months ago prev next
Not at the moment, but I'll add it to my to-do list. The algorithms used are based on supervised learning - if that helps you search for a starting point.
ada_lovelace 7 months ago prev next
Interesting concept! What type of news do you aggregate - mainly tech or broader categories?
gnawhoy 7 months ago next
It includes a wide range of news sources (over 100), so the news can be quite diverse - but it is still tech-focused. Categories include web development, CS research, data science, VR, AI, and more.
pam_developer 7 months ago prev next
Thank you for sharing! Would love to see how it selects articles specific to my interests. Could you elaborate on the input you provide to the algorithm for personalization?
gnawhoy 7 months ago next
Sure! It looks at a user's reading habits (title and content reads, time spent on each article), as well as pre-selected interests. It uses this information to assign a score to each article, with higher scores attributed to articles more likely to be in line with the user's interests.
alice_the_encoder 7 months ago prev next
Very curious as to what pre-selected interests would help you match better. Is there a list of available options or perhaps a means of inputting a custom keyword?
gnawhoy 7 months ago next
Currently, we offer predefined options, but we are considering expanding to include custom keywords. The current interests include several subtopics within AI (deep learning, NLP, computer vision, etc.), web development paradigms (JavaScript, React, Vue, etc.), and more.
crypt_cat 7 months ago prev next
Love the project, any plans on open-sourcing it?
gnawhoy 7 months ago next
It's something we might consider in the future, but for the time being, it's a closed-source project.
grace_coder19 7 months ago prev next
I'm impressed - do you have any evaluations or documentation of the algorithm's performance?
gnawhoy 7 months ago next
Thanks for the compliment! Yes, we evaluated the solution against a smaller (<100 users) random sample with a 0.82 precision score and ~0.75 recall. Documentation will be released along with a demo video soon.
for_looper3000 7 months ago prev next
How long did it take to build?
gnawhoy 7 months ago next
The development took around three months, splitting time between the news aggregator interface, and the underlying machine learning algorithms that fuel the personalization.
future_ml_guru 7 months ago prev next
I've seen similar projects but not with the same level of sophistication. Great innovation! Do you have an estimate of the server costs for such an application?
gnawhoy 7 months ago next
Our estimate ranges from $100 to $150 per month, depending on usage spikes - this covers running costs for the cloud hosting (servers, storage, and bandwidth), as well as periodic ML model retraining.
bob_webdev 7 months ago prev next
Have you thought about implementing further aspects of AI like a chatbot to recommend new topics based on users' conversations?
gnawhoy 7 months ago next
That is definitely an interesting concept we'll keep in mind as we iterate and develop on this project. Thank you for the input!
dr_algo 7 months ago prev next
Great idea! I'd be curious to learn more about the application architecture and data flow - particularly the features that capture user interests.
gnawhoy 7 months ago next
There is a two-fold approach - user profiling based on reading habits and selection of article categories based on user preferences. The implementation is based on a combination of Python (for ML), Django (web framework), and PostgreSQL (database).
rob_quant 7 months ago prev next
How do you label the data to train the algorithm - are you using data annotation techniques?
gnawhoy 7 months ago next
That's an excellent question. To train the model initially, we utilized semi-supervised learning, employing distant supervision and rule-based heuristics to generate weak labels. These labels were then manually corrected to create high-quality training data.
rand_user123 7 months ago prev next
Very interested in this - good luck with your project and I expect to see more from you in the future!
gnawhoy 7 months ago prev next
Thank you, we're eager to see how this project resonates with the community and continuously improve upon it. Excited to be sharing our work with everyone!