Cordon and Drain Node Maintenance in Kubernetes

When it’s time for node maintenance, you don't just pull the plug—you use Cordon and Drain. Think of Cordon as a "No Vacancy" sign; it stops new Pods from being scheduled on that node. Drain is the polite eviction notice, safely moving existing Pods to other healthy nodes so you can start your work. Once maintenance is finished, Uncordon flips the sign back to "Open," allowing the node to host workloads again. It’s the best way to keep your cluster running smoothly without any downtime! #Kubernetes #DevOps #CloudNative #K8sTips #SoftwareEngineering

To view or add a comment, sign in

More Relevant Posts

ServerScribe

75 followers
1w
Report this post
𝗕𝘂𝗻 𝘁𝘂𝗿𝗻𝗲𝗱 𝗝𝗮𝘃𝗮𝗦𝗰𝗿𝗶𝗽𝘁 𝗿𝘂𝗻𝘁𝗶𝗺𝗲𝘀 𝗶𝗻𝘁𝗼 𝗮 𝘀𝗽𝗲𝗲𝗱 𝗮𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲 At Bun, performance isn’t an afterthought. It’s the starting point. That changes how modern apps are built and shipped. Without high-performance runtimes: • apps take longer to start • builds slow down delivery • infrastructure costs increase With Bun, 𝘁𝗲𝗮𝗺𝘀 𝗴𝗲𝘁 𝗳𝗮𝘀𝘁𝗲𝗿 𝗶𝗻𝘀𝘁𝗮𝗹𝗹𝘀, 𝗳𝗮𝘀𝘁𝗲𝗿 𝗯𝘂𝗶𝗹𝗱𝘀, 𝗮𝗻𝗱 𝗳𝗮𝘀𝘁𝗲𝗿 𝗿𝘂𝗻𝘁𝗶𝗺𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲. The DevOps lesson: 𝗦𝗽𝗲𝗲𝗱 𝗶𝘀 𝗮 𝗳𝗲𝗮𝘁𝘂𝗿𝗲. From build to runtime, performance compounds across your entire pipeline. At ServerScribe, we help teams optimize systems for speed — not just stability. Is your stack optimized for performance — or just working “well enough”? 👇 #DevOps #ServerScribe #Bun #Performance #DeveloperExperience #SRE #ModernStack
Like Comment
To view or add a comment, sign in
Kumar D
4w Edited
Report this post
Kubernetes is powerful. But these common issues will humble you fast. 🧵 Here's what every DevOps engineer should know: **1. CrashLoopBackOff** Your container keeps crashing and restarting. Root cause → App error, missing env var, wrong config Fix: kubectl logs <pod-name> --previous **2. ImagePullBackOff** Kubernetes can't pull your Docker image. Root cause → Wrong tag, private registry, no imagePullSecret Fix: kubectl describe pod <pod-name> **3. Pending Pod** Pod is stuck, never starts. Root cause → No node has enough CPU/Memory, PVC not bound, taint mismatch Fix: kubectl describe pod <pod-name> **4. OOMKilled** Pod is killed silently. Root cause → Container exceeded memory limit Fix: kubectl describe pod <pod-name> | grep -i oom → Increase memory limit in your deployment **5. Node NotReady** Worker node drops out of the cluster. Root cause → Disk full, kubelet crashed, network issue Fix: kubectl describe node <node-name> kubectl get events --sort-by='.lastTimestamp' **6. Service Not Reachable** App is running but users can't connect. Root cause → Wrong port, label mismatch, endpoint empty Fix: kubectl get endpoints <service-name> kubectl describe svc <service-name> **7. Readiness Probe Failing** Pod runs but never gets traffic. Root cause → App starts slow, wrong path, wrong port in probe Fix: kubectl describe pod <pod-name> → Check Events section at the bottom **The golden rule in Kubernetes:** Always start here → kubectl describe kubectl logs kubectl get events 95% of issues are solved with these 3 commands. Save this. Share this. #Kubernetes #DevOps #AWS #EKS #CloudComputing #K8s #DevOpsEngineer #SRE
Like Comment
To view or add a comment, sign in
Pratap Koonisetty
1w
Report this post
Kubernetes v1.36: ハル (Haru) https://lnkd.in/gKEyS-Yf 🚀 Kubernetes v1.36 is here! The latest release of Kubernetes continues to push the boundaries of scalability, security, and workload efficiency in cloud-native environments. Key highlights from v1.36: • Improved workload scheduling and resource management • Enhancements in security and policy controls • Better observability and debugging capabilities • Continued focus on stability with multiple features graduating to GA #Kubernetes #CloudNative #DevOps #Containers #PlatformEngineering #SRE
Like Comment
To view or add a comment, sign in
Jose Suarez
6d
Report this post
💡 Not every deploy is zero-downtime. ⚙️ Our backend deploys take the old pod down before bringing the new one up. The service is unreachable for 30 seconds to 2 minutes during each deploy. Why? Because the alternative (rolling updates) was causing rollout timeouts and unpredictable failures. We chose a brief, predictable gap over an unpredictable rolling deploy. The trade-off: ✅ Cost: schedule deploys outside peak traffic. ✅ Benefit: every deploy finishes cleanly, no flaky half-states. Engineering is often less about "what's the best pattern" and more about "what works for this system." #SoftwareEngineering #DevOps #Engineering #Kubernetes #K8s #CloudNative #SRE #PlatformEngineering
Like Comment
To view or add a comment, sign in
Relnx

104 followers
1w
Report this post
☸️ Kubernetes v1.36.0 is out — and this is a major release with real changes you should pay attention to. This isn’t just “new features” — it’s also about what’s changing or going away. ⚠️ One important change: The long-deprecated gitRepo volume has finally been removed. 👉 If you still rely on it, workloads will break after upgrading — and you’ll need to migrate to alternatives like init containers or external sync tools. (Kubernetes) ✨ On the feature side: • Mutating Admission Policies → now stable (less reliance on webhooks) • User Namespaces → improved isolation for containers • Dynamic Resource Allocation (DRA) → continues evolving for advanced workloads These are the kinds of changes that impact: • security posture • workload isolation • cluster extensibility Kubernetes v1.36 is a good reminder: 👉 Major releases are not just about what’s new — they’re about what might break and what needs migration. At Relnx, we track these changes so you can quickly understand: ✅ breaking changes ✅ new capabilities ✅ upgrade impact 🔎 Full release breakdown: https://lnkd.in/g3PEecwm For platform teams — What’s your first step when a new Kubernetes version drops: 👉 Check features 👉 Or check breaking changes first? #Kubernetes #CloudNative #SRE #DevOps #PlatformEngineering #Relnx
Like Comment
To view or add a comment, sign in
Turja N Chaudhuri ( 🚀 to the Cloud )
3w Edited
Report this post
Platform Engineering 102 : Source code by itself does not make a company $$. It's only when it is deployed, and accessed by the customers, does a company start making $$. So, the whole point of a technology platform is to optimize this process, every step of the way - basically make it as seamless as possible to move code from the repo -> compute. The faster this process is, the more $$ the company is poised to make. #flowoptimization #platformengineering #devops
Like Comment
To view or add a comment, sign in
Shubham Vashishtha
3w
Report this post
👉 Ingress is not dead… but it’s no longer enough. In many Kubernetes environments I’ve worked on, Ingress was the default choice for exposing applications. Simple. Effective. Limited. Now, we’re seeing a clear shift toward the Gateway API. Why? Because real-world platforms need more than basic routing. Here’s what changes with Gateway API: ✅ Clear separation between platform team and app team responsibilities ✅ Advanced traffic control (not just HTTP, but TCP/UDP as well) ✅ Better multi-tenant support in enterprise clusters ✅ More extensibility with modern gateways (Envoy, Istio, Kong) In OpenShift and large-scale environments, this becomes critical. 💡 Takeaway: If you’re still designing your platform only around Ingress, you’re limiting how far your architecture can evolve. #Kubernetes #OpenShift #Networking #GatewayAPI #DevOps #CloudNative #PlatformEngineering #SRE
Like Comment
To view or add a comment, sign in
Sachin Pandey
3w
Report this post
Completely agree — Ingress isn’t dead, but it’s definitely not enough for modern platforms. As Kubernetes environments scale, Gateway API brings the flexibility, control, and separation teams actually need. If you're still relying only on Ingress, you might be limiting your architecture’s future. Great insight 👏 #Kubernetes #GatewayAPI #OpenShift #DevOps
Shubham Vashishtha

Platform Engineer | Kubernetes (CKA/CKAD) | OpenShift | Alauda ACP | GitOps (Argo CD) | Tekton CI/CD | RHCSA RHCE | Linux | DevOps | Multi-Cluster | Air-Gapped | Security | Automation | Troubleshooting
3w

👉 Ingress is not dead… but it’s no longer enough. In many Kubernetes environments I’ve worked on, Ingress was the default choice for exposing applications. Simple. Effective. Limited. Now, we’re seeing a clear shift toward the Gateway API. Why? Because real-world platforms need more than basic routing. Here’s what changes with Gateway API: ✅ Clear separation between platform team and app team responsibilities ✅ Advanced traffic control (not just HTTP, but TCP/UDP as well) ✅ Better multi-tenant support in enterprise clusters ✅ More extensibility with modern gateways (Envoy, Istio, Kong) In OpenShift and large-scale environments, this becomes critical. 💡 Takeaway: If you’re still designing your platform only around Ingress, you’re limiting how far your architecture can evolve. #Kubernetes #OpenShift #Networking #GatewayAPI #DevOps #CloudNative #PlatformEngineering #SRE
Like Comment
To view or add a comment, sign in
Mumin Uddin
2w
Report this post
Kubernetes started making a lot more sense to me when I realised it is not the beginning of the story. This image sums that up well. Applications first ran directly on top of hardware and an operating system. Then virtual machines improved isolation, but each VM needed its own operating system, which added more overhead. Containers changed that by being much lighter and more efficient, while still keeping applications separate. But that created the next challenge. Once you have lots of containers running across different machines: - how do you scale them? - how do you roll out updates without downtime? - how do you restart things when they fail? - how do you use resources properly? That is where Kubernetes comes in. Kubernetes is not the start of the story. It is the response to the problems that came after containers. CoderCo #Kubernetes #DevOps #Containers #CloudComputing #LearningInPublic #PlatformEngineering
1 Comment
Like Comment
To view or add a comment, sign in
Likhith I
2w
Report this post
Kubernetes didn’t make our system faster. It made our mistakes less dangerous. Before orchestration, a small configuration issue could mean: ->Downtime ->Manual restarts -> Panic debugging -> Emergency calls With Kubernetes, failures still happen. Containers crash. Nodes go down. Deployments misbehave. But the system doesn’t freeze. It reacts. It replaces. It reroutes. It retries. That shift changed how I build software. Now I don’t just ask: “Does this work?” I ask: “What happens when it breaks?” Because in distributed systems, things will break. The goal isn’t perfection. It’s controlled recovery. That’s what modern infrastructure taught me. #Kubernetes #CloudNative #Resilience #SoftwareEngineering #Microservices #DevOps #EngineeringMindset #ScalableSystems
Like Comment
To view or add a comment, sign in

2,336 followers

82 Posts

View Profile Follow

Cordon and Drain Node Maintenance in Kubernetes

More Relevant Posts

Explore related topics

Explore content categories