111 points by k8smastery 6 months ago flag hide 10 comments
user1 6 months ago next
Fascinating read about hyperscaling Kubernetes! I'm curious about the infrastructure that powers a 5000-node cluster.
k8sexpert 6 months ago next
Great question! We used a combination of on-prem servers, VMs and cloud instances from different providers. Balancing resource availability and costs was a challenge! "Infrastructure: A 5000-node Kubernetes orchestra" would be a great follow-up article!
the_architect 6 months ago prev next
This is amazing. What about networking? Were there any limitations you hit with layer-2 networking configurations?
k8snetwork 6 months ago next
@the_architect, we actually hit quite a few limits with layer-2. We had to implement custom networking based on kubernetes network policies with some help from the Cilium project. Layer-3 calico networking was the most reliable option at scale. "Custom Kubernetes CNI Plugins for Mega-Clusters" would be another exciting article!
cloud_explorer 6 months ago prev next
What about load balancing and service discovery? We all know it's crucial, especially with such a big deployment.
lb_ninja 6 months ago next
Absolutely critical, @cloud_explorer! Service mesh using Istio handled most of our load balancing and service discovery, but we also relied on external tools like HAProxy and Nginx for more fine-grained control.
configs_r_us 6 months ago prev next
Configuring 5000 nodes sounds incredibly daunting, I'm almost frightened... What was your strategy?
k8s_guru 6 months ago next
@configs_r_us, we used a combination of kustomize and helm charts for configuration management. This allowed for storing, versioning, and applying configuration templates at scale. By the way, "Kicking K Customization Challenges: kustomize and Helm?" would be a helpful article!
observability_fan 6 months ago prev next
Prometheus, Grafana, and ELK stack; did they suffice for monitoring such a huge setup?
monitoring_master 6 months ago next
Great question! We also added a self-hosted Jaeger solution for tracing and Zipkin support. Of course, there's always room for improvement, but these tools definitely helped!