Kubernetes pod calls api.stripe.com. Before it gets an answer, it makes 4 DNS lookups. 3 of them fail. This happens on every external DNS call in every pod in your cluster. And most teams don't know it's happening. The culprit: ndots:5 in /etc/resolv.conf. Kubernetes sets this by default. It means any domain with fewer than 5 dots gets search domains appended first. api.stripe.com has 2 dots. So your pod tries: → api.stripe.com.default.svc.cluster.local (fail) → api.stripe.com.svc.cluster.local (fail) → api.stripe.com.cluster.local (fail) → api.stripe.com (success) 3 wasted queries. Every time. The fix takes 30 seconds. Set ndots:2 in your pod's dnsConfig. Or add a trailing dot to external FQDNs. Wrote a full breakdown of how DNS resolves inside a cluster, from resolv.conf to CoreDNS to upstream. Link: https://lnkd.in/grFg9AT4 #Kubernetes #DevOps #DNS #CloudNative #SRE
Kubernetes Default Settings Impact on Performance
Explore top LinkedIn content from expert professionals.
Summary
Kubernetes default settings are pre-configured values designed to provide broad compatibility and easy setup, but they can significantly affect cluster performance in areas such as DNS resolution, CPU usage, and service load balancing. Understanding how these defaults work helps you avoid pitfalls like unnecessary DNS lookups, resource throttling, and inefficient request distribution.
- Review DNS defaults: Change the ndots setting in your pod configuration to reduce extra DNS lookups and speed up responses for external calls.
- Audit resource limits: Check and update default CPU and memory limits in deployment files or Helm charts to prevent unexpected throttling during production workloads.
- Evaluate load balancing mode: Consider switching from Kubernetes’ default iptables mode to IPVS or Cilium for better scalability and smarter traffic distribution, especially on newer Linux kernels.
-
-
🕒 2:13 AM Alert: "CPU throttling detected on all nodes." We were under attack — or so we thought. Pods were crashing randomly. Services were flaky. CPU metrics were through the roof. Traffic was normal. No spike. No malicious activity. Yet, our production cluster was choking. 🔍 SREs jumped in. "Autoscaler isn't working," one said. "Node CPU is 95%," another pointed out. "Pods are hitting resource limits," DevOps chimed in. Everyone was looking at the symptoms, not the root cause. We added more nodes. Same issue. Added bigger nodes. Still throttling. Something didn't add up. 😓 Teams started questioning each other. Infra blamed the app. App team blamed resource limits. Platform team blamed Kubelet. I paused. Opened one of the crashing pod's YAML files. Then another. Then another. Same pattern. resources: limits: cpu: "200m" 💡 That’s when it hit me. Helm chart’s default values had overridden resource settings for ALL production pods. Our app containers were running with dev limits in prod. Pods weren't allowed to use more than 200m CPU — even when they needed 2 cores. The node had plenty of CPU. But containers were choking themselves. 🔧 We patched the Helm release with correct values. Restarted the pods. Boom — stable in 2 minutes. Lessons Learned: ✅ Always audit Helm defaults before promoting to production ✅ CPU throttling ≠ actual CPU usage issues ✅ Blame doesn’t solve anything — YAML does 🚀 Kubernetes teaches you one thing: Production isn't about just running pods. It's about understanding what's running — and why it’s failing. 💬 Have you seen a weird production issue like this? 👇 Let me know in the comments. Kubernetes Kubernetes #kubernetes #production #issues #sre #devops #alerts #pods #services #helm
-
Did you know that by default Kubernetes Services use legacy iptables rules? The reason is backwards compatibility out of the box. The problem is these rules are legacy since the iptables subsystem has largely been replaced by the new Netfilter on newer kernels. There is a iptables-legacy subsystem that provides translations. iptables has a O(n) (linear scale) depending on the backend services. The loadbalancing rules are pretty limited and not very intellegent usually leading to a massively disproportional distribution of backend requests between services. We can do a lot better. This is why I recommend if you have an application where loadbalancing work across many replicas (not just HA) I would replace the default with one of the following options: IPVS mode - This is the balance between legacy compatilbity and performance. This allows the kubelet to utilize the kernels IPVS functionality built into Netfilter. The Kernel version of the nodes needs to be up to at least 4.19 (which came out in like 2018 IIRC) and a few kernel modules to be loaded (ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh, nf_conntrack). It might be backported in enterprise kernels. IDK check with your OS vendor. Cilium Kube-proxy replacement - This is the new kid on the block and requires cilium. This is hands down much more performant with requests overall since it leverages the eBPF kernel subsystem to bypass the kernel packet filter. This also allows for advanced configuration of loadbalancing modes just like IPVS but implemented by Cilium instead of modules in the Linux Kernel (still runs in kernel space using eBPF if you are unfamiliar). It requires kernel versions 5.1 and above I believe (keep in mind latest is 6.12). A vast majority of usecases don't require "compatibility" as much as they require logic and performance. The kubernetes defaults are sane and will work but both IPVS and Cilium are much more traffic aware and scale at O(1) (constant) rates which at scale is a tremendious difference.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development