The Future of Kubernetes Node Lifecycle

Lucy S.

Published Mar 27, 2026

Recording is now available: https://www.youtube.com/watch?v=-TlFdB7E-Bw

This post is a summary of "The Future of Kubernetes Node Lifecycle" which Xiaohui "Dawn" Chen and I presented at Kubecon EU 2026 in Amsterdam. The recording will be available soon on the CNCF YouTube channel.

Last year at Kubecon EU in London, Dawn and I stood on stage and showed three features we believed would change how people manage resources in Kubernetes. Pod Level Resources, In-Place Pod Resizing and Node Swap memory support. We said they were coming, they came. In-Place Pod Resizing is now GA in Kubernetes 1.35. Pod Level Resources is enabled by default. Node swap is in beta and useable in production.

This year, we came back to talk about what's next, and what's next is bigger.

The Problem

Kubernetes was designed for a world where workloads were disposable. You run a stateless web frontend, something goes wrong with the node, you drain it, kill the pods, start new ones somewhere else. Nobody notices. That model was elegant and it worked for a long time. It doesn't work anymore.

The workloads people are running on Kubernetes today are fundamentally different. AI training jobs that run for days or weeks. Stateful databases with terabytes of locally attached storage. GPU accelerated inference serving real-time traffic. These workloads are long-lived, more brittle, have stricter SLOs, and have real data gravity. You can't just drain a node and hope for the best.

But that's still what Kubernetes does. Maintenance equals drain. There's no nuanced middle ground, and the signals Kubernetes give you about node state are too crude, a node is either Ready or NotReady, with nothing in between. There's no way to say "this node is ready for these workloads but not these ones" or "this node needs maintenance but we should let the current job finish first".

What we mean by Node Lifecycle

When we talk about Node Lifecycle in the working group, we mean the full arc: Provision, Join, Serve, Maintain, Recover/Replace, Retire. The challenges we're facing through that arc fall into three categories:

Signals. How nodes tell the system what they are and what state they're in. Right now, node readiness is binary and feature discovery is fragmented. Different vendors label the same hardware differently. The scheduler doesn't know what a node can actually do.
Actions. What happens when you need to maintain, update, or replace a node. Today, the primary action is kubectl drain, which is a blunt instrument that treats planned maintenance the same way it treats a node failure.
Coordination. How maintenance actions interact with workloads availability at scale. Modern maintenance happens across zones, pools and entire fleets. There's no unified mechanism to coordinate between node maintenance intent, workload disruption tolerance, and global capacity.

What we're building

Eviction Request

This is the change I'm most excited about. Today, eviction in Kubernetes is a point-in-time decision. You create an eviction object against a pod, and either the eviction is accepted right now (the pod gets killed) or it's rejected right now (because of PodDisruptionBudget). That's it. Binary, instantaneous, no negotiation.

The Eviction Request API replaces this with an intent based system. This matters because Kubernetes has always been declarative at the workload level, you describe what you want, and the system converges towards it. But eviction has remained stubbornly imperative, you tell the system to do something, and it either does it right now or refuses. There's no way to express the intent to disrupt a workload without actually disrupting it. No way to say "I will need this pod gone eventually, please start preparing", rather than "kill this pod now".

The Eviction Request API introduces that missing concept of intent. A controller declares that it would like to evict a pod, not now, but at some point. Responders (referred to as interceptors in the KEP), controllers that have registered interest in that pod, are then notified and given time to act. Each responder does whatever business logic it needs. Checkpoint the workload, migrate data to another replica, close connections gracefully, notify dependant services. Each reports its progress and expected completion time. Only when all responders have completed does the actual eviction proceed.

This shift from imperative to declared intent changes what's possible. For stateful workloads with locally attached storage, it means copying data to a new replica before the pod goes away. For AI training jobs, it means checkpointing the model so days of computation aren't lost. For inference workloads, it means gracefully draining traffic before termination. For platform operators, it means you can finally see and reason about disruption as a first-class object in your cluster, not something that just happens, but something that's planned, tracked and coordinated.

We're targeting the 1.37 release for this as an alpha feature, and Uber is already running a reference implementation.

Recommended by LinkedIn

63). Scaling n8n at Work

Dino Cajic 4 months ago

Kubernetes Observability with the Grafana Stack — Part…

Mauricio "MORE-ree-SEE-o" Kilikrates Alexandrino Nicolia dos Anjos 3 weeks ago

I Scaled My WebSocket... Now What? How Conscious…

Luis Machado Reis 11 months ago

https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/4563-eviction-request-api/README.md

Node Readiness Gates

Node readiness in Kubernetes is currently a single boolean: the node is Ready or it isn't. This is based on whether the kubelet considers itself ready. But "ready" means different things for different workloads. A node might be ready for general compute but not for GPU workloads because the NVIDIA driver hasn't initialised. It might be ready for batch jobs but not for latency-sensitive services because the security agent hasn't started.

Node Readiness Gates introduce a declarative verification layer. Instead of a single Ready condition, you define ReadinessRules that must be met before a node is considered ready to serve. These can include checks for GPU drivers, security agents, storage systems, monitoring infrastructure, anything that needs to be healthy before the first production pod lands on that node.

This moves us from "the kubelet is running, therefore the node is ready" to "the node has been verified against a set of requirements that match the workloads we intend to schedule there".

https://github.com/kubernetes/enhancements/issues/5233

Node Declared Features

One of the most frustrating problems in Kubernetes today is label fragmentation. Different cloud providers, different vendors, and different operators label the same hardware capabilities differently. One cluster uses accelerator=nvidia another uses gpu.present=true. Scheduling policies aren't portable. Automation breaks across environments.

Node Declares Features introduces a standardised mechanism, node.status.declaredFeatures, where the kubelet automatically reports the features it supports to the API server. The scheduler can then match pods to nodes based on capabilities without requiring manual nodeSelector configuration or inconsistent vendor-specific labels.

This helps eliminate "black box scheduling", the situation where a control plane assumes nodes support features that kubelets are too old to execute, causing pods to fail on arrival with errors that are difficult to trace back to a version skew problem.

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5328-node-declared-features

Why this isn't enough

Everything I've described so far operates at the level of individual nodes. But modern infrastructure maintenance doesn't happen one node at a time. It happens across zones, pools and fleets

If you need to update a driver across a hundred nodes, the question isn't "can I drain this one node safely", it's "can I drain these hundred nodes in a sequence that maintains application availability across my entire fleet". That requires coordination between a node maintenance intent, workload disruption tolerance, and global capacity that doesn't exist in Kubernetes today.

I believe the next horizon is moving from node-centric management to workload-centric management. Users care about their applications, not their nodes. Nodes should be an implementation detail that doesn't burden the end user. We're not there yet, but the primitives we're building, Eviction Request, Node Readiness Gates, Node Declared Features, are the foundation that helps make fleet-level orchestration possible.

This is a community effort

None of this was built by just the people who present it. The Node Lifecycle Working Group includes contributors from across the Kubernetes ecosystem. Filip Křepinský and Ryan Hallisey co-lead the working group with me. Xiaohui "Dawn" Chen , Derek Carr and Mrunal Patel are the tech leads of SIG Node, where much of this work gets done. Engineers from Google, NVIDIA, Red Hat, Uber, and many other organisations are contributing design, code, feedback and testing.

If you're running Kubernetes at scale and these problems sound familiar, the Working Group meets weekly. All are welcome.

And if you were one of the 1,600+ people who registered for our talk in Amsterdam, thank you. The room couldn't hold everyone and I'm sorry if you got turned away. The recording will be up soon, and I hope this post helps fill the gap in the meantime.

Shaun Keenan 1mo

Great talk! I was there, well done. Love the direction that you all are taking with k8s

1 Reaction

Lucy S. 1mo

Slides are available here: https://hosted-files.sched.co/kccnceu2026/a7/The%20Future%20of%20Kubernetes%20Node%20Lifecycle%20-%20Google%20Slides.pdf

2 Reactions

Senthil Kumaran 1mo

Enjoyed reading this. Thanks for the summary of your presentation.

2 Reactions

Toon Van Deuren 1mo

The node readiness controller and eviction requests are such great improvements. It was clear that a lot of people are excited about them. I hope this excitement results in quick adoption

1 Reaction

Hannes Linke 1mo

Great session

3 Reactions

See more comments

To view or add a comment, sign in

The Future of Kubernetes Node Lifecycle

Lucy S.

The Problem

What we mean by Node Lifecycle