Join now Sign in

From the course: Debugging Kubernetes

Troubleshooting effectively with the many whys - Kubernetes Tutorial

From the course: Debugging Kubernetes

Start my 1-month free trial Buy for my team

Troubleshooting effectively with the many whys

“

- [Presenter] In this course, we're going to learn how to troubleshoot a variety of common issues that occur while working within Kubernetes clusters. However, before diving into any problem, Kubernetes or otherwise, having an approach to troubleshooting problems is important. Knowing how to troubleshoot instead of throwing ideas at a wall, as it were, will help you stay calm when the pressure's high, make sense out of what might look like nonsense, and learn from failure and become a better engineer. Let's talk about two approaches that we'll be using to solve the seriously hairy problems that we'll be working through throughout this course, the Many Whys, and Spot the Pattern or Spot the Outlier. But first, let's get two ground rules out of the way. First, troubleshooting is a little bit like art. There's no one right way to solve a problem. If you solve the problem within this course differently than I did, then that's great. Second, these approaches aren't guaranteed to produce a root cause. Sometimes problems just come and go. Maybe the conditions that caused the problem changed or have gone away. All of that is okay. If the problem you're looking for can't be found right this instant, just try again later. All that said, let's learn about the two approaches that we'll see throughout this course. The first approach that we'll see often is what I call The Many Whys. The Many Whys is a simple process to follow. First, ask yourself a why or what question about what's in front of you. Then, come up with three possible hypotheses to that question. Pick one of them. Then do something to test a hypothesis, and if the test you run passes, repeat again said hypothesis. When you encounter a problem that looks confusing or feels unexpected, simply asking why or what as you go deeper and deeper into that problem helps a lot in developing a path towards a root cause. My DevOps aficionados might be familiar with The Five Whys from the acclaimed book, "The Phoenix Project" by Gene Kim, Jez Humble and others. You might have also seen The Five Whys from courses right here on LinkedIn Learning like DevOps foundations from my friends, Ernest Mueller and James Wicked. This approach is very similar. For example, let's say we've run into a problem where an application running within a Kubernetes cluster runs much more slowly than expected. Here's how we can use The Many Whys approach to deduce what the cause of the slowness might be. The first step is coming up with a why or what style question about what's in front of us. In this case, the question is pretty straightforward. Why is this application running slower than usual? Next, we'll come up with three possible answers to the question, like the ones to the right of the slide. Now, application slowness is a very broad problem domain. There are probably millions of reasons why an application could run slowly. The goal here is to narrow our possibilities down to a handful that we can actually test. This way, we can increase our chances at going down a path that will lead us towards the cause of this problem. Next, we're going to pick one of our possible answers and test it. So let's say that we think that the system resources being lower than usual is the possible explanation for the app slowness. There are many resources that applications consume, like memory, processors and local or networked discs. This is a great example of an answer with many possible paths, many answers fall into this category. So with these kinds of answers, I find that it's better to test each path at a time. Again, there's no one right way of troubleshooting problems. So start with the path that feels right in the moment. Your gut feel will get more accurate with experience. Going back to the answer we chose, let's start with the system's memory. We can test whether the system's memory is the culprit by getting the total amount of memory on the system and comparing it against its free memory. Fortunately, there are many tools that can help us with this, such as top or htop, the vmstat and free -h, preferably so that you can see the units in actual human readable form. So let's say that we ran free against our system and it gave us a result like the one shown here on the bottom left. We can see here that the free memory on this system is decreasing quickly. This is a big clue that we can work with. This also means that we have a great starting place to repeat this process from. From here, we could ask our next why. What's causing memory to be consumed like this? That brings us back to step one. As you can see, we didn't come up with the root cause right away. It often takes, well, many whys to get to that point. What we did do, however, is turn a very wide problem, a slow application into a much narrower one, rapid memory decrease.

Contents