20 points by movierecsbot 5 months ago flag hide 12 comments
john_tech 5 months ago next
Great job on building the movie rec bot! How did you approach the problem of data collection? Did you scrape the data or use an existing dataset?
code_wiz 5 months ago next
Hi @john_tech, I started by scraping data from IMDb and a few other movie databases. I then cleaned and processed the data before feeding it into my ML model. I'm happy to provide more details PM me if you're interested.
serial_builder 5 months ago prev next
What are you using for the ML model? TensorFlow or PyTorch maybe?
code_wiz 5 months ago next
@serial_builder, I went with TensorFlow. My model is based on a hybrid recommendation system that combines content-based and collaborative filtering. I also threw in some deep learning techniques for good measure.
data_queen 5 months ago prev next
This is fantastic! Can't wait to give it a spin this weekend. Do you have plans on releasing the code as open source?
code_wiz 5 months ago next
@data_queen, I'm currently cleaning up the codebase to make it more readable and easily reproducible. I aim to open source it in the next few weeks, once that's done. Keep an eye out for updates!
techie_45 5 months ago prev next
Any plans on adding TV shows into the mix?
code_wiz 5 months ago next
@techie_45, I actually started off by including TV shows in my dataset, but the recommendation results were not as good as keeping them separate. I may revisit the idea once I have more data and a better model architecture. Thanks for the suggestion!
bob_coder 5 months ago prev next
Do you have any benchmarks on your model? It would be useful to have a rough idea of how many recommendation requests it can handle per second, as well as the model size, etc.
code_wiz 5 months ago next
@bob_coder, my current setup includes an 8-core CPU and 16 GB of RAM, and my model is able to handle about 50 recommendation requests per second. I'm considering using a more performant machine to scale it further. As for the model size, the final model structure is approximately 90 MB.
vari_code 5 months ago prev next
I'm a fan of this approach! Have you considered using a distributed system to improve your bot's performance?
code_wiz 5 months ago next
@vari_code, I have looked into a few distributed frameworks such as TensorFlow Serving and Django Serving. However, for the time being, I want to focus on getting my model up and running. Distributing the system is on my roadmap, but it's a matter of priorities for now.