etcd – The Memory of Kubernetes
Kubernetes is a powerful container orchestration system, but have you ever wondered where it stores all its data? How does it keep track of nodes, pods, deployments, secrets, and configurations?
The answer lies in etcd—a distributed key-value store that acts as the memory of Kubernetes. Every change made in Kubernetes is stored and retrieved from etcd, making it the single source of truth for the cluster.
In this article, we’ll explore:
What is etcd?
etcd is a highly available, distributed, and consistent key-value store used by Kubernetes to store all cluster data. It ensures that every component in the control plane has access to the most recent and consistent state of the cluster.
Some of the key features of etcd include:
What Data Does etcd Store in Kubernetes?
etcd is critical to Kubernetes, and a failure in etcd can bring down the entire cluster.
Now, let’s explore two detailed real-world scenarios demonstrating etcd in action.
Scenario 1: How Kubernetes Stores and Retrieves Data in etcd
(Understanding the Core Functionality of etcd in Kubernetes)
Let’s assume you are deploying an application in Kubernetes using a Deployment YAML file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-container
image: nginx
Step 1: Kubernetes API Server Receives the Request
When you run:
kubectl apply -f deployment.yaml
Step 2: etcd Stores the Desired State
Step 3: Controllers and Scheduler Use etcd Data
Step 4: Retrieving Data from etcd
Now, let’s retrieve the stored data using kubectl:
kubectl get deployment web-app -o yaml
This command reads the current state from etcd via the API Server and returns the stored YAML.
Scenario 2: etcd Failure and Disaster Recovery in Kubernetes
Recommended by LinkedIn
(What Happens When etcd Fails and How to Recover the Cluster?)
etcd is mission-critical, and losing etcd data can result in permanent cluster failure.
Let’s simulate an etcd failure and recover from a backup.
Step 1: Verify etcd is Running
Check the status of etcd on the Kubernetes control plane node:
kubectl get pods -n kube-system | grep etcd
Output:
etcd-control-plane 1/1 Running 10m
Step 2: Simulate an etcd Failure
To test recovery, stop the etcd service on the control plane node:
sudo systemctl stop etcd
Now, check the Kubernetes API Server:
kubectl get nodes
You might see an error like:
Unable to connect to the server: dial tcp 127.0.0.1:6443: connect: connection refused
This happens because etcd is down, and Kubernetes cannot retrieve cluster state.
Step 3: Restore etcd from Backup
If you had taken a snapshot of etcd, you can restore it:
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd/etcd-backup.db \
--name etcd-server \
--initial-cluster etcd-server=https://127.0.0.1:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls https://127.0.0.1:2380
Then, restart the etcd service:
sudo systemctl start etcd
Step 4: Verify the Recovery
Check if etcd is running and the cluster state is restored:
kubectl get nodes
If successful, Kubernetes should be back online, retrieving data from the restored etcd database.
Key Takeaways
Mastering etcd is essential for troubleshooting, disaster recovery, and high availability in Kubernetes.
Let’s Discuss
Have you ever faced an etcd failure in Kubernetes? How do you manage etcd backups and recover your cluster? Share your experiences in the comments.
Follow Bavithran M for more DevOps, Kubernetes, and cloud-native insights.
Found this useful? Share it with your network.
Hey Bavithran M diving into etcd's role in Kubernetes is like exploring the hidden depths of a superhero's powers! Your breakdown sheds light on the backbone of cluster resilience. Kudos!
#connections