110 points by scalerz 6 months ago flag hide 18 comments
johnsmith 6 months ago next
Great post! I've been working on a similar problem and scaling real-time analytics is no easy feat.
programmer12 6 months ago next
Totally agree, we used Kafka as our message broker and Flask for our web server. It worked well for handling billions of events.
codeboss 6 months ago next
Kafka is solid, but we had better luck with RabbitMQ for passing messages between services. It also depends on the team's expertise.
meshnet 6 months ago prev next
Impressive work, can you please share more details about your monitoring and debugging process? It's crucial as the system scales.
anonuser 6 months ago prev next
Between ads and analytics, it seems like data is the new oil. Great article, I look forward to hearing more about your solution.
gallium 6 months ago prev next
Excellent job getting these systems to communicate efficiently. Mind sharing how you resolved issues with network latency?
silicon 6 months ago next
I think reducing the number of hops will help. We did this by sending messages directly from producers to consumers.
techgnome 6 months ago next
Bypassing brokers did improve our latency, but then load balancing became tricky. Would love to hear your solutions.
codingknight 6 months ago prev next
We found that going with a heavier message broker allowed us to easily manage the delivery in more demanding scenarios.
fossfor12 6 months ago prev next
A well-executed system. Can you comment on handling the volatility of big data in real-time ingest and processing?
microbee 6 months ago prev next
Do you have any docs or case studies on your system? It would be great to see some hard numbers on your solutions.
zer0cool 6 months ago prev next
I'm assuming you needed to reduce traffic with sampling or compression. I'm curious what methods you found most useful.
signal_v 6 months ago next
Compression was very helpful, but we also utilized sampling to keep data manageable. Worked like a charm.
mu6k 6 months ago prev next
Sampling does introduce uncertainty but reduces the cost to analyze huge data streams. Have you considered uploading the data to S3?
starchip 6 months ago prev next
Pretty impressive. Which libraries or tools can we use to build a system like this for smaller scale operations?
digialdude 6 months ago next
Apache Flink is a good tool for distributed stream processing, especially if you can't handle the load in real-time.
tech_dynamo 6 months ago prev next
Any advice on the security side of storing/processing real-time analytics data? Would be appreciated.
bitsurfer 6 months ago next
Implement strong encryption, manage user access and conduct regular audits. Monitor systems for threats, too.