The Root Cause to Every Problem

The Root Cause to Every Problem

To me Problem Management is the unsung hero of ITIL. I feel passionate about the process. I have used it successfully to significantly reduce Incidents at a number of organisations. Over the years I have been able to reduce the root cause of every problem into four categories.

So here they are, the root cause to every problem you are ever going to have...

People

People make mistakes, people do things that they shouldn't.  So how do you protect against this? With procedures and processes.  Staff should follow the defined processes every time.  If they are working from memory then they're at a significant risk of not following the process and of missing something.  That something is probably going to be important. 

Real world example: After getting all support staff to follow processes I recorded a 25% reduction in Incidents over a three month period.

Processes

Processes, procedures and work instructions should be document and followed.  However processes can be wrong or non-existent.  If they are wrong, correct them.  If they don't exist, create them.

Real world example: The documented system restart process didn't work and the system failed to come back online.

Hardware

Hardware is pretty tough but it does fail, hard drives die, power supplies blow up.  You can buy as much high-availability as you can afford but you are only minimising the risk of failure.  The risk will still be there and the impact may well be high. 

Real world example: An entire storage array was unavailable because of multiple failed hard disks.

Software

Bad coding, bugs and exploits, software is vulnerable to wide array of failures. I have seen many servers suffer from memory leaks or applications that simply crash unexpectedly.  Software is fragile.  It needs to be looked after.  It needs to be patched… so patch it!  Have regular maintenance windows to keep your software up to date and healthy. 

Real world example: Corporate desktops were infected with a virus because a 12 month old patch wasn't applied.


Agree, disagree or do you have another root cause? Leave a comment and share your experiences with Problem Management.

Lee


Very good topic and comments. More the emphasize on one leads us nowhere. Gaps in people, process and technology needs addressing equally.

Like
Reply

Great article on Problem Management. If the proper emphasis is put on this within organisations and if issues are followed up fully it can lead to a much more stable and agile environment

Like
Reply

90% or all incidents are change related is some way, shape, or form. With most organizations pressured to deliver quickly and cheaply, it usually comes at a cost to quality. I believe people want to do a good job, but organizations set them up to fail due to demands to meet impossible deadlines. The shift left Agile/DevOps movement is helping to change that or at least improve the odds. So I believe the majority of root cause’s can be tied back to human decisions.

Great article. Couple of things I would like to add: People- People will make mistakes despite all the procedures and processes in place. As they say "To err is human". So, organizations need to ensure there is no blame culture where individuals are allowed to learn from their mistakes and not criticized. Service Architecture- More focus and importance needs to be given to implementing a robust Service Design which allows organizations to implement "Service Management by design" within solutions. Technology will fail at some point and there is nothing once can do to prevent it, but how the service handles that failure is very important to be designed into the solution so that recovery is quick and seamless

...and yet with all the monitoring and diagnostic tools available to us, we still spend weeks and even months arguing over root cause sometimes. Then once we find root cause to be one thing, say software, we are asked to solve it with hardware or process, because it's too expensive or will take too long to fix the software. Note this happens with hardware and people too! So, perhaps another root cause is a bad choice of SDLC which leads to defects in all any/all areas of people, process, tools that was never ready for transition to ITIL in the first place.

To view or add a comment, sign in

More articles by Lee P.

Others also viewed

Explore content categories