etcd – The Memory of Kubernetes

etcd – The Memory of Kubernetes

Kubernetes is a powerful container orchestration system, but have you ever wondered where it stores all its data? How does it keep track of nodes, pods, deployments, secrets, and configurations?

The answer lies in etcd—a distributed key-value store that acts as the memory of Kubernetes. Every change made in Kubernetes is stored and retrieved from etcd, making it the single source of truth for the cluster.

In this article, we’ll explore:

  • What etcd is and why it is critical for Kubernetes
  • How etcd stores and retrieves data in Kubernetes
  • Two real-world scenarios showing its role in maintaining cluster consistency
  • Step-by-step practical exercises to interact with etcd


What is etcd?

etcd is a highly available, distributed, and consistent key-value store used by Kubernetes to store all cluster data. It ensures that every component in the control plane has access to the most recent and consistent state of the cluster.

Some of the key features of etcd include:

  • Strong Consistency – Ensures data integrity across multiple nodes.
  • Leader Election – Uses the Raft consensus algorithm to elect a leader.
  • High Availability – Designed for failover and redundancy in multi-node clusters.
  • Low Latency – Optimized for quick read/write operations.
  • Snapshot & Backup Support – Enables disaster recovery and rollback.

What Data Does etcd Store in Kubernetes?

  • Cluster configuration
  • API objects (nodes, pods, services, deployments, secrets, config maps)
  • Role-based access control (RBAC)
  • Network policies
  • Controller states

etcd is critical to Kubernetes, and a failure in etcd can bring down the entire cluster.

Now, let’s explore two detailed real-world scenarios demonstrating etcd in action.


Scenario 1: How Kubernetes Stores and Retrieves Data in etcd

(Understanding the Core Functionality of etcd in Kubernetes)

Let’s assume you are deploying an application in Kubernetes using a Deployment YAML file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web-container
          image: nginx        

Step 1: Kubernetes API Server Receives the Request

When you run:

kubectl apply -f deployment.yaml        

  • The API Server processes the request and validates the YAML file.
  • If valid, it stores the Deployment definition in etcd.

Step 2: etcd Stores the Desired State

  • The API Server writes the new Deployment configuration to etcd.
  • The key-value pair stored in etcd looks something like this:
  • This record now becomes the source of truth for this deployment.

Step 3: Controllers and Scheduler Use etcd Data

  • The Scheduler reads from etcd to find pending pods and assigns them to nodes.
  • The ReplicaSet Controller ensures that 3 pods are always running.
  • Every time a pod starts, etcd updates its state.

Step 4: Retrieving Data from etcd

Now, let’s retrieve the stored data using kubectl:

kubectl get deployment web-app -o yaml        

This command reads the current state from etcd via the API Server and returns the stored YAML.


Scenario 2: etcd Failure and Disaster Recovery in Kubernetes

(What Happens When etcd Fails and How to Recover the Cluster?)

etcd is mission-critical, and losing etcd data can result in permanent cluster failure.

Let’s simulate an etcd failure and recover from a backup.

Step 1: Verify etcd is Running

Check the status of etcd on the Kubernetes control plane node:

kubectl get pods -n kube-system | grep etcd        

Output:

etcd-control-plane     1/1     Running   10m        

Step 2: Simulate an etcd Failure

To test recovery, stop the etcd service on the control plane node:

sudo systemctl stop etcd        

Now, check the Kubernetes API Server:

kubectl get nodes        

You might see an error like:

Unable to connect to the server: dial tcp 127.0.0.1:6443: connect: connection refused        

This happens because etcd is down, and Kubernetes cannot retrieve cluster state.

Step 3: Restore etcd from Backup

If you had taken a snapshot of etcd, you can restore it:

ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd/etcd-backup.db \
  --name etcd-server \
  --initial-cluster etcd-server=https://127.0.0.1:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls https://127.0.0.1:2380        

Then, restart the etcd service:

sudo systemctl start etcd        

Step 4: Verify the Recovery

Check if etcd is running and the cluster state is restored:

kubectl get nodes        

If successful, Kubernetes should be back online, retrieving data from the restored etcd database.


Key Takeaways

  • etcd is the single source of truth for Kubernetes, storing all cluster state data.
  • Scenario 1 demonstrated how Kubernetes stores, retrieves, and updates data in etcd.
  • Scenario 2 showed what happens when etcd fails and how to restore it from a backup.
  • Without etcd, Kubernetes cannot function, making etcd one of the most critical components in the system.

Mastering etcd is essential for troubleshooting, disaster recovery, and high availability in Kubernetes.


Let’s Discuss

Have you ever faced an etcd failure in Kubernetes? How do you manage etcd backups and recover your cluster? Share your experiences in the comments.

Follow Bavithran M for more DevOps, Kubernetes, and cloud-native insights.

Found this useful? Share it with your network.

Hey Bavithran M diving into etcd's role in Kubernetes is like exploring the hidden depths of a superhero's powers! Your breakdown sheds light on the backbone of cluster resilience. Kudos!

To view or add a comment, sign in

More articles by Bavithran M

Others also viewed

Explore content categories