Troubleshooting Logical Process
It seems that technology has become such a common place thing in our lives that we now take for granted the fact that not everyone understands it. When we get hired into a technical support job role, everyone assumes you either already know it or are going to learn it on your own. One of the things I have observed over the years is that sometimes people need a little direction to get started.
People can often become overwhelmed with what they need to do to get started. Let’s look at a simple process that can help when you are working technical issues.
Step 1 Identify the Problem
This can be more difficult than it sounds at first. When someone calls in with a problem or they email a ticket into your ticketing system. You will see that what they say is not always accurate, often times it is fare from accurate and you will need to dig a little deeper on what the problem really is. Well how do we do this? If you ask someone they will probably tell you to check the logs right? Well yes and while that is a good answer it does not really tell you what you were asking about. What you need to do is start with asking the right questions.
When you are approaching a ticket, the first question you need to ask is not to the user but to yourself.
What is my goal? What am I trying to do with this ticket?
Your answer should probably be something along the lines of fixing the underlying problem that is causing the symptoms to the user. Just like a doctor you will be looking at the symptoms and not always the problem at least not right away
The other thing you need to ask yourself is how quickly you will be solving this problem. Notice I said will be solving, because there is no if here. There is no escalation path for us. Why not? Well it wouldn’t make a very good article if I just told you to escalate this to someone else to fix now would it?
Now that you have identified that you will be resolving the underlying problem and you want to get the person back to work as quickly as possible we can begin looking at the ticket.
The next thing I do is to open a notepad or if you are working in a ticketing system open your time entry box or note box so that yo0u can take your notes directly in the ticket. This will save you some time and cut out a lot of questions when someone goes back to read your notes.
Then I like to read the issue that was submitted. Once we have that information we can begin looking at what the symptoms are and how they can point us to an underlying problem.
For instance, say Joe called in because he keeps getting locked out of his email. You could go and unlock his account in active directory. This does not solve the problem thought it just allows you to get your email again. But what is causing you to lock out the active directory account. Now this could be something simple like he is typing the wrong password or it could be that windows has the wrong saved password.
Or it could be a bigger issue that needs to be addressed on your active directory integrations. This is why it is important to identify if this is a root cause or just a symptom.
In order to identify what is going on we will need to gather information. For most of us we do this without even thinking about how to do it. This would be things like asking questions or looking at the logs.
You will want to ask probing questions, things that will get the user to go into more detail about what happened, what they were doing when it happened, has it happened before.
What has changed on the computer?
Is there any system maintenance? Patching? Software pushes?
You always hear people telling you to check logs, check the logs troubleshoot the issue. What does that actually mean? Well, we know that windows will create a log somewhere for just about everything that happens on the computer. Sometimes these logs can be hard to find or they might not be enabled. We are going to focus on the big 3 though. Application, System, and Security
The common way to look at these logs is through the use of event viewer. This will let you search or filter the logs to see only the information you are looking for. When you find those ever elusive logs, what do they really mean? Well this is the part that no one likes to hear. Google it. Yes that is right, you find the logs you need to google the event ID and see what the log is for. Technet has a lot of good information on what the logs mean. Then you can start to correlate what the symptoms are to what the logs say and build a picture of what happened.
Step 2 Establish a Theory
Now that you have gathered some information about what is going on and seen the logs, let’s look at what all this actually means. If you take the research you found from googling the event id’s and error logs with the symptoms you should be able to form some ideas on what has caused the issue. This can come from experience or from knowledge of how the computer works and how windows interacts to operate.
There may be several valid theories here, the trick will be to test them in a controlled way. This will bring us to the next step in our process.
Step 3 Test and Evaluate
Now that you have a theory on what’s wrong, it is time to test. Before you test though, you know what you have to do. That’s right backups! Make sure you have a good backup and a plan on how to back out. You will need to be able to reverse anything you are doing and set the conditions back to the original so you can test your next theory.
One of the most important things to remember when you are going to test is to document the results of your test. Did it work? What happened when you did it? Were there any unintended effects? Document all those good and bad so you can look at it and see if this is an improvement or if something needs to be reset to the original conditions and a new theory tested. By doing this you will be able to help yourself and other techs in the future so you can see what happened and what fixed it and then next time you will not have to go through all this, you might be able to just jump to your testing phase after you have the experience.
Once you have tested enough to find a workable theory we can move on to step 4!
Step 4 Corrective Action
Well so far we have identified the problem, formed a theory, and tested that theory. Now it is time to take it out of the test environment. Now that you have found a working solution, we need to make sure that gets applied to the computer or computers that are having that issue. This could have already been done in the testing phase.
Step 5 Verify System Functionality
The most important step to your users will be to have a stable working system. Since you have been testing on the system and changing settings you will want to test to make sure that the system is fully functional before turning the system back over to the user.
Step 6 Document Findings, Actions, and Outcomes
Now that you have fixed the issue, it is time to wrap things up and make sure you have everything documented for the next time you see something similar happen. This will be very important if you are working in a ticketing system or in a corporate environment.
Remember it is just as important to document something that did not work, as it is to document what did work.
Do you have a different process that has proved to work better?
google.com