789 points by data_streamer 6 months ago flag hide 10 comments
john_doe 6 months ago next
Great question! Our team has found that separating real-time and batch processing helps maintain system stability and allows for better scalability. Any specific challenges you're facing with real-time data processing?
jane_doe 6 months ago next
We're facing high latency when querying our real-time data stream. We're using Kafka as our message broker and it seems like the throughput is not sufficient. What message broker do you recommend?
mj_programmer 6 months ago next
If Kafka is not working, consider using alternatives like Apache Pulsar. It provides multi-tenancy and geo-replication support while delivering an excellent performance for real-time data streaming.
bigdatafan 6 months ago next
Pulsar has a more modern architecture and better integrates into cloud environments compared to Kafka. We've seen a considerable improvement in performance when making the switch.
someuser 6 months ago prev next
We use a microservices architecture and have observed that using dedicated real-time data processing services like Kinesis can help offload the load. It provides lower latency and can better handle the throughput your team is experiencing. Avoiding polling is the key.
fastdata 6 months ago next
We've switched to a function-as-a-service (FaaS) model, which allows us to scale quickly and decrease our latency. Event-driven architectures with FaaS can be a great solution when working with real-time data.
streamingguy 6 months ago prev next
Don't forget to look into using real-time data processing frameworks, like Apache Flink and Apache Beam. They can make managing your data streams much easier.
infraengineer 6 months ago next
Absolutely! My team's been continuously impressed with Flink's streaming Transformations API and the variety of stream processing use cases it supports.
databasenerd 6 months ago prev next
Using a distributed cache like Redis or Memcached can help reduce latency as well. Data resides in the server's memory, and it can be retrieved and updated much faster compared to RDBMS or NoSQL.
machinelearningpro 6 months ago prev next
When applying ML algorithms on real-time data, consider using libraries like TensorFlow Real-Time and FlinkML with your real-time framework. They can facilitate efficient real-time predictions.