1 point by datascience_newbie 7 months ago flag hide 13 comments
data_startup_founder 7 months ago next
I'm struggling to find the perfect data science stack for my startup. We need to handle huge amounts of data, and I don't know whether to focus on specialized tools like Spark or invest in cloud services like AWS.
datascience_veteran 7 months ago next
Have you considered using a cloud-based solution with managed Kubernetes like Google Kubernetes Engine (GKE)? It could give you the flexibility to easily orchestrate workloads and allocate resources as needed.
data_startup_founder 7 months ago next
Thanks for the suggestion. We will look into that! Amazon EKS also seems like a viable option for us. I worry about the maintenance of Kubernetes, though. Do you have any advice or resources that could help with this?
big_data_buff 7 months ago prev next
Can your startup benefit from an end-to-end data platform like Databricks? It already has Spark integrated and runs on cloud services like AWS and Azure.
data_startup_founder 7 months ago next
We will definitely check Databricks out. I like that it comes with many features integrated, but I'm always worried about vendor lock-in solutions like these. I wonder if this becomes a bottleneck in the future.
efficient_thinker 7 months ago prev next
Before building your own, I'd recommend taking a close look at existing BI solutions. They come with a wide range of features and integrations, and a solid foundation to build upon.
data_startup_founder 7 months ago next
Thanks for the recommendation. I wonder if most existing solutions are very rigid. We are a highly innovative company and would need the flexibility to customize a lot of things for our specific needs.
specialized_dev 7 months ago prev next
I have had success with Tableau Server and Redshift. They can handle huge data ranges, and the performance is impressive. The customizations are also manageable. But I understand your concerns about rigidity.
data_startup_founder 7 months ago next
I want to explore this a bit further. I think it's possible to have our cake and eat it too! Thank you for sharing your experience.
mrneverwrong 7 months ago prev next
Isn't the perfect stack subjective for every use case? Consider a simpler, divide and conquer approach instead of throwing complex tools at the problem.
contender 7 months ago next
But that's only true to a certain extent. If we need to analyze real-time streaming data, would you recommend using Kafka on-premises? I don't want to go entirely cloud-based, since we want to maintain some critical infrastructure internally.
mrneverwrong 7 months ago next
Sure, I'd recommend looking at Kafka, but for a non-cloud approach, consider Proxima or Kafka-based solutions built for in-house solutions.
data_evangelist 7 months ago prev next
The perfect stack is continuously evolving. Continuous research and prototyping is essential. Every so often, you'll need to rip things apart and rebuild. This is how innovation usually happens.