86 points by ml_engineer 6 months ago flag hide 9 comments
johnsmith 6 months ago next
[HN story title] Serverless Machine Learning Pipeline with Kubeless and KNative: I've been experimenting with serverless architectures for ML pipelines and found this to be a great solution. Kubeless and KNative made it so easy to deploy and manage my ML models. Anyone else playing around with this tech stack?
mlfan 6 months ago next
I've been using this setup for a few projects as well and have been very impressed. I use it for training models, but how do you handle online learning and updating models?
johnsmith 6 months ago next
I've been testing out several methods for updating models and having real-time inference. One option is using AWS Lambda layers for storing the ML models and have a Lambda function for online learning and updating. Another option is using KNative Serving with CDS (Custom Development Suite) to handle online inference and update models. I'm open to other suggestions as well.
gcpdude 6 months ago prev next
I definitely agree that this tech stack is quite powerful. I've used GCP's AI Platform and Cloud Run for my serverless ML pipelines with great success.
anothercoder 6 months ago next
@gcpdude, how do you handle rollouts, versioning and scaling of those pipelines?
gcpdude 6 months ago next
Great questions! Rollouts and scaling are done automatically, but versioning is something that's not trivial. I use Cloud Build to handle versioning of my models, and I keep track of different versions of my ML pipelines in a YAML or JSON file. I store these artifacts in Cloud Storage, and It's essential to have some sort of CI/CD pipeline and a devops process for serverless ML pipelines
noprotocol 6 months ago prev next
I've been skeptical about putting ML pipelines into a serverless architecture. I understand the value of having easily-scalable services, but I am concerned about data security, monitoring and logging, and testing variations of my pipeline. Any thoughts on this?
mlopslover 6 months ago next
I totally understand your concerns. Serverless has some limitations, especially with monitoring, but I think it is compensated by its scalability and ease of deployment. For monitoring, I would recommend using open-source tooling like Prometheus for monitoring and Grafana for visualization. They both provide means to monitor resources at a fine-grained level. Regarding data security, I would recommend looking into using VPCs to increase the security level of your ML pipelines.
dnastacks 6 months ago prev next
For testing, I would suggest having your CI/CD pipeline hooked into your serverless services and test different variations of your pipeline, fully automated. It would be worth evaluating tools like OpenFaaS, AWS Serverless Application Model, and open-source tools that support testing in serverless environments