Kubernetes Troubleshooting: Identify the Broken Layer

What's the first thing you do when a Kubernetes deployment breaks? I used to start running commands. Now I start with one question: Which layer is actually broken? That changed how fast I debug Kubernetes. I use 4 buckets: ━━━ Start with: kubectl get pods -n <namespace> The STATUS column usually tells you where to look next. Pending → Scheduling CrashLoopBackOff / ImagePullBackOff / 0/1 Running → Runtime Running, but no traffic → Networking Running, traffic reaches it, response is wrong → Application Then: kubectl describe pod <name> -n <namespace> Go straight to Events. That is usually where the real failure shows itself. The skill is not running more commands. The skill is identifying the layer first, then pulling the shortest path to the cause. ━━━ - Bucket 1: Attach Pod to Node (Scheduling) If the pod is stuck in Pending, the scheduler rejected placement. Resources too high. Taint not tolerated. Label missing. Affinity rules impossible to satisfy - Bucket 2: Start the Container (Runtime) This is where the pod lands, but the container does not stay healthy. CrashLoopBackOff ImagePullBackOff readiness/liveness failures Unbound PVC means it's waiting on a volume that doesn't exist yet. Running ≠ healthy. - Bucket 3: Route Traffic (Networking) This is where Kubernetes feels “fine” but traffic still disappears. I usually check: kubectl get svc,ep,ing,networkpolicy -n <namespace> Then read it in order: Service exists? Endpoints populated? Selector correct? targetPort correct? NetworkPolicy blocking ingress? This is where silent failures live. - Bucket 4: Keep It Running (Application) The request made it through. The application did not. Bad env var. Broken config. Dependency unreachable. Health endpoint wrong. Response incorrect. At this point, the cluster is not your problem anymore. Four layers. One failure. Name the bucket. Then debug inside that layer. That is what makes Kubernetes troubleshooting faster. What's the first command you run when a pod breaks? #Kubernetes #DevOps #CloudEngineering #SRE

Know your container. K8 doesn't create health check endpoints. It only checks what you tell it to check. Wrong path. Wrong port. App not ready yet. The probe fails.If the probe keeps failing, the container can restart before you even get a chance to look inside. A lot of “K8 issues” start below the cluster. Know what your container actually exposes.

Like
Reply

To view or add a comment, sign in

Explore content categories