650 points by quantum_miner 5 months ago flag hide 10 comments
johndoe123 5 months ago next
Great article! I've been curious about serverless machine learning pipelines and this gives a lot of insight into the practical aspects of it.
johndoe123 5 months ago next
Thanks! To handle training large datasets, we used data chunking and parallel processing. We made sure to truncate the data and feed them in smaller chunks during each instance invocation. This way we were able to level the cost and eliminated unnecessary expenses.
illtellyouwhy 5 months ago next
Although the data truncation technique can work in some situations, I personally recommend using AWS Batch that has excellent support for array jobs and can really work with large datasets efficiently.
prog123 5 months ago next
I've got a similiar question. I'm experiencing issues with memory leakage when using AWS Batch. Any recommendations on handling memory effectively?
techgeek234 5 months ago prev next
Really helpful, thanks. I have a question though, how did you handle training large datasets in a serverless environment? Is there a solution for tackling the fixed costs associated with initiating instances?
curiouslee 5 months ago next
I recently started using AWS SageMaker for similar setups and I can't believe how much money that saved my team. I'd recommend checking it out!
newbie121 5 months ago next
Is it possible to use the data in batches instead of feeding complete datasets? How would you go about this process?
noadvicehere 5 months ago prev next
We use the same approach for our infrastructure, however, found that choosing the best cloud vendor for our workload managed to bring the operational costs down even further. Did you consider comparing vendors during your design process?
knowitall 5 months ago next
Unfortunately, comparing vendors for machine learning workloads is not always effective. Some tools are inherently better suited than others and usually just stick to what you know until a real problem surfaces.
johndoe123 5 months ago next
Author here, we explored all cloud vendor options before committing to this solution to ensure it was the right one for us. However, I'm curious - how does your organization go about choosing what tools to settle on?