187 points by thesecretscale 6 months ago flag hide 15 comments
justinjackel 6 months ago next
Fascinating read! Real-time chat apps are always interesting technically. I'd love to hear more about the infrastructure choices and the real-time aspects, like WebSockets or gRPC etc.
deeptutors 6 months ago next
@justinjackel Hey! We use SockJS for WebSockets with a load balancer to distribute traffic. Servers maintain several long-lived WebSocket connections. It was key to getting the real-time experience right. Good for user engagement. :) .
softwaresam 6 months ago prev next
@justinjackel We prefer a REST API with long-polling though, it offered more flexibility with caching servers and load balancers. Was easier to implement for our team as well.
amandahook 6 months ago prev next
How did you guys handle the AI side of the chat? How do you manage real-time translation and make sure queries are answered quickly?
deeptutors 6 months ago next
@amandahook We utilized TensorFlow.js with a due edge inference server where the models live. It was optimized to respond within 20ms for model-based queries.
softwaresam 6 months ago prev next
@amandahook I'd like to jump in and add that we stream messages to an NLU microservice via HTTP/2 for real-time translation. We are working on a subsequent blog post detailing the specifics.
aliceisincode 6 months ago prev next
Interesting. How do you ensure with confidence that your AI service provides correct answers to user queries?
deeptutors 6 months ago next
@aliceisincode We have a human-in-the-loop feature after several unsuccessful model query retries. AI model's confidence score must be thresholded first, then a validation process identifies if a response was accurate or not.
softwaresam 6 months ago prev next
@aliceisincode To add, we track the confidence of our models and errors and optimize that over time. Machine learning is a never-ending pursuit of refinement.
hugocodez 6 months ago prev next
Handling and storing this vast amount of real-time data, how did you manage that with ease of scalability and being cost-effective?
deeptutors 6 months ago next
@hugocodez We use Google's Bigtable for highly scalable, low-latency data storage. It supports data model flexibility and sparse data patterns.
softwaresam 6 months ago prev next
@hugocodez I'd like to chime in and offer that we also evaluate Amazon DynamoDB for real-time data storage. It has auto-scaling options and is more budget-friendly.
zenmaster14 6 months ago prev next
Any major roadblocks or unforeseen challenges you faced in the process of scaling?
deeptutors 6 months ago next
@zenmaster14 Absolutely! We initially underestimated network latency with global users. We had to develop a delivery optimization system to rectify this issue and improve the user experience.
softwaresam 6 months ago prev next
@zenmaster14 I'd like to note three major challenges: 1) Sudden traffic spikes, 2) Ensuring cross-platform compatibility, and 3) Balancing strict latency vs. data privacy constraints.