etcd – The Memory of Kubernetes

Bavithran M

Published Feb 14, 2025

Kubernetes is a powerful container orchestration system, but have you ever wondered where it stores all its data? How does it keep track of nodes, pods, deployments, secrets, and configurations?

The answer lies in etcd—a distributed key-value store that acts as the memory of Kubernetes. Every change made in Kubernetes is stored and retrieved from etcd, making it the single source of truth for the cluster.

In this article, we’ll explore:

What etcd is and why it is critical for Kubernetes
How etcd stores and retrieves data in Kubernetes
Two real-world scenarios showing its role in maintaining cluster consistency
Step-by-step practical exercises to interact with etcd

What is etcd?

etcd is a highly available, distributed, and consistent key-value store used by Kubernetes to store all cluster data. It ensures that every component in the control plane has access to the most recent and consistent state of the cluster.

Some of the key features of etcd include:

Strong Consistency – Ensures data integrity across multiple nodes.
Leader Election – Uses the Raft consensus algorithm to elect a leader.
High Availability – Designed for failover and redundancy in multi-node clusters.
Low Latency – Optimized for quick read/write operations.
Snapshot & Backup Support – Enables disaster recovery and rollback.

What Data Does etcd Store in Kubernetes?

Cluster configuration
API objects (nodes, pods, services, deployments, secrets, config maps)
Role-based access control (RBAC)
Network policies
Controller states

etcd is critical to Kubernetes, and a failure in etcd can bring down the entire cluster.

Now, let’s explore two detailed real-world scenarios demonstrating etcd in action.

Scenario 1: How Kubernetes Stores and Retrieves Data in etcd

(Understanding the Core Functionality of etcd in Kubernetes)

Let’s assume you are deploying an application in Kubernetes using a Deployment YAML file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web-container
          image: nginx

Step 1: Kubernetes API Server Receives the Request

When you run:

kubectl apply -f deployment.yaml

The API Server processes the request and validates the YAML file.
If valid, it stores the Deployment definition in etcd.

Step 2: etcd Stores the Desired State

The API Server writes the new Deployment configuration to etcd.
The key-value pair stored in etcd looks something like this:
This record now becomes the source of truth for this deployment.

Step 3: Controllers and Scheduler Use etcd Data

The Scheduler reads from etcd to find pending pods and assigns them to nodes.
The ReplicaSet Controller ensures that 3 pods are always running.
Every time a pod starts, etcd updates its state.

Step 4: Retrieving Data from etcd

Now, let’s retrieve the stored data using kubectl:

kubectl get deployment web-app -o yaml

This command reads the current state from etcd via the API Server and returns the stored YAML.

Scenario 2: etcd Failure and Disaster Recovery in Kubernetes

Recommended by LinkedIn

Moving from the Monolith to Microservices: Challenges…

Amjad Sidqi 2 years ago

Prometheus and Grafana on the top of Kubernetes

Suraj S. 5 years ago

Scaling Prometheus: Architecting Observability for…

Iman Abrehdari 1 year ago

(What Happens When etcd Fails and How to Recover the Cluster?)

etcd is mission-critical, and losing etcd data can result in permanent cluster failure.

Let’s simulate an etcd failure and recover from a backup.

Step 1: Verify etcd is Running

Check the status of etcd on the Kubernetes control plane node:

kubectl get pods -n kube-system | grep etcd

Output:

etcd-control-plane     1/1     Running   10m

Step 2: Simulate an etcd Failure

To test recovery, stop the etcd service on the control plane node:

sudo systemctl stop etcd

Now, check the Kubernetes API Server:

kubectl get nodes

You might see an error like:

Unable to connect to the server: dial tcp 127.0.0.1:6443: connect: connection refused

This happens because etcd is down, and Kubernetes cannot retrieve cluster state.

Step 3: Restore etcd from Backup

If you had taken a snapshot of etcd, you can restore it:

ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd/etcd-backup.db \
  --name etcd-server \
  --initial-cluster etcd-server=https://127.0.0.1:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls https://127.0.0.1:2380

Then, restart the etcd service:

sudo systemctl start etcd

Step 4: Verify the Recovery

Check if etcd is running and the cluster state is restored:

kubectl get nodes

If successful, Kubernetes should be back online, retrieving data from the restored etcd database.

Key Takeaways

etcd is the single source of truth for Kubernetes, storing all cluster state data.
Scenario 1 demonstrated how Kubernetes stores, retrieves, and updates data in etcd.
Scenario 2 showed what happens when etcd fails and how to restore it from a backup.
Without etcd, Kubernetes cannot function, making etcd one of the most critical components in the system.

Mastering etcd is essential for troubleshooting, disaster recovery, and high availability in Kubernetes.

Let’s Discuss

Have you ever faced an etcd failure in Kubernetes? How do you manage etcd backups and recover your cluster? Share your experiences in the comments.

Follow Bavithran M for more DevOps, Kubernetes, and cloud-native insights.

Found this useful? Share it with your network.

Rangaraj Balakrishnan 1y

Hey Bavithran M diving into etcd's role in Kubernetes is like exploring the hidden depths of a superhero's powers! Your breakdown sheds light on the backbone of cluster resilience. Kudos!

2 Reactions

Bavithran M 1y

#connections

1 Reaction

See more comments

To view or add a comment, sign in

etcd – The Memory of Kubernetes

Bavithran M

What is etcd?

What Data Does etcd Store in Kubernetes?

Scenario 1: How Kubernetes Stores and Retrieves Data in etcd

(Understanding the Core Functionality of etcd in Kubernetes)

Step 1: Kubernetes API Server Receives the Request

Step 2: etcd Stores the Desired State

Step 3: Controllers and Scheduler Use etcd Data

Step 4: Retrieving Data from etcd

Scenario 2: etcd Failure and Disaster Recovery in Kubernetes

Recommended by LinkedIn

(What Happens When etcd Fails and How to Recover the Cluster?)

Step 1: Verify etcd is Running

Step 2: Simulate an etcd Failure

Step 3: Restore etcd from Backup

Step 4: Verify the Recovery

Key Takeaways

Let’s Discuss

More articles by Bavithran M

Others also viewed

Integrating Prometheus and Grafana on the top of Kubernetes

🧭 Outbox Pattern for Reliable Event Dispatching

Grafana Mimir Deployment Guide for Scalable Metrics Storage

Integrating Prometheus and Grafana (on top of Kubernetes)

How to Utilize the “Heapster + InfluxDB + Grafana” Stack in Kubernetes for Monitoring Pods

Integration of Prometheus and Grafana using Kubernetes.

Deploying Grafana & Prometheus Over Kubernetes (Persistent Volume)

Kubernetes Architecture

Understanding Kafka through Real World Use Cases - Part 2

Explore content categories

What is etcd?

What Data Does etcd Store in Kubernetes?

Scenario 1: How Kubernetes Stores and Retrieves Data in etcd

(Understanding the Core Functionality of etcd in Kubernetes)

Step 1: Kubernetes API Server Receives the Request

Step 2: etcd Stores the Desired State

Step 3: Controllers and Scheduler Use etcd Data

Step 4: Retrieving Data from etcd

Scenario 2: etcd Failure and Disaster Recovery in Kubernetes

Recommended by LinkedIn

(What Happens When etcd Fails and How to Recover the Cluster?)

Step 1: Verify etcd is Running

Step 2: Simulate an etcd Failure

Step 3: Restore etcd from Backup

Step 4: Verify the Recovery

Key Takeaways

Let’s Discuss

More articles by Bavithran M

From Chaos to Clarity: How AI Is Reinventing Cloud-Native Observability

India's AI Ecosystem: From Silent Contributor to Global Powerhouse

How U.S. Tariffs Are Impacting the AI Industry: A Deep Dive Into Economic, Technological, and Global Implications

Managing Kubernetes Users, Groups, and ServiceAccounts: A Deep Dive with Hands-On Scenarios

Understanding Role-Based Access Control (RBAC) in Kubernetes: Roles, RoleBindings, and ClusterRoles

Backup & Restore Kubernetes Persistent Data Using Velero

CSI (Container Storage Interface) & External Storage Providers in Kubernetes

The AI-Native Cloud: A Leap into the Future of Autonomous Computing

StatefulSets vs Deployments: Running Stateful Applications in Kubernetes

100 Days of K8s with DevOps-Day 32: Deploying Databases in Kubernetes: A Comprehensive Guide

Others also viewed

Integrating Prometheus and Grafana on the top of Kubernetes

🧭 Outbox Pattern for Reliable Event Dispatching

Grafana Mimir Deployment Guide for Scalable Metrics Storage

Integrating Prometheus and Grafana (on top of Kubernetes)

How to Utilize the “Heapster + InfluxDB + Grafana” Stack in Kubernetes for Monitoring Pods

Integration of Prometheus and Grafana using Kubernetes.

Deploying Grafana & Prometheus Over Kubernetes (Persistent Volume)

Kubernetes Architecture

Understanding Kafka through Real World Use Cases - Part 2

Explore content categories