The 5 Problem Management Traps to Avoid and the Missing ITIL Role
Or How I Learned to Stop Worrying and Love my Problems. (Vol. 1)
If you have ever been in a company that is trying to establish or even enhance a Problem Management service as part of their IT Service Management (ITSM), then you have probably witnessed several of the challenges companies face when trying to build out this service.
In my career, I’ve discovered 5 important traps that, if avoided, can prevent a lot of frustration, and increase the effectiveness of preventing incidents across the board.
The 5 Traps:
Trap#1 - Setting up Problem Management too soon when deploying ITSM services.
Trap#2 - Incident Managers and/or Support Engineers getting in the habit of not doing Post Incident Reviews.
Trap#3 - Not building the Problem process first before building out the ticketing tool-set or not dedicating enough development resources to build out the tool-set.
Trap#4 - Placing your Problem Management teams too low in your organizational structure.
Trap#5 - Expecting that your Problem Managers could or should effectively identify root cause for all your incidents.
Why are these traps so important you may be asking? **** If you want to find out why these traps are dangerous and how your company might avoid them please review my detailed content in the Appendix at the end of this paper ****
The Missing ITIL Role:
All of these traps stem from misconceptions related to learning ITIL, relative to the concept of Problem Management being responsible for Root Cause Analysis (RCA) and that RCA is the same thing as what I suggest be defined as Root Cause Identification (RCI). RCA and RCI in practice are simply not the same thing, RCA is in practice the analytical deduction of future incident risk from the root cause of any given problem, and RCA can only be completed if the RCI is done well for the incidents associated to the problem. Which leads me to my view then, that ITIL would be more successful with the addition of a new role I like to call “Post Incident Review Engineer” (PIRE), which would be the accountable service for the successful delivery of incident RCI. This position, in my experience, would serve as the missing bridge in ITIL between Incident Management and Problem Management and helps companies get the technical deep RCI, from their experts, for their worst and/or most impacting incidents, which in turn enables problem managers to focus on the RCA risk analysis, problem prioritization, and problem resolution needed by companies to prevent future incidents.
PIRE Responsibilities and Expectations:
1. Identify the technical root cause of each incident, with priority towards the highest priority incidents first. (Attempting to drive RCI automation for lower priority incidents.)
2. Summarize and categorize the cause of the incident into the incident record.
3. Associate incidents caused by existing problems to the problem record.
4. Communicate to Problem Management when a new problem has been identified.
5. Escalate to Problem Management to facilitate problem identification discussions, should the root cause not be easily identifiable.
Please Note **** If you do not want to make a dedicated group of engineers, the PIRE hat can be worn by your Incident Managers or Support Engineers. Often it can be more effective to have the people directly involved with the incident do the PIRE work. However, make sure to avoid Trap#2 and block out enough time for these teams to complete their Post Incident Reviews. ****
Incident/Problem/Change Lifelines avoiding the traps and establishing PIRE:
Below I have outlines the ITSM Core Service Lifespans and attempted to show how each process interconnects as a best practice from my experience:
By Avoiding the traps, establishing a PIRE Team, and using the division of labor as visually represented above, you can maximize the time your Problem Managers spend to analyze the Incident Root Cause data, prioritize your problems and influence your teams to resolve those problems. This focused, efficient, and effective Problem Management team will then break the cycle of your repeat incidents, reduce the load on your support staff, make your customers happy by having less incidents, creating an overall positive experience with your product and saving you lots of money on unnecessary support costs.
Hope the high level information helps you to build a more efficient and effective Problem Management team. I look forward to your comments and thoughts, and feel free to reach out to me with questions or ideas, or if I might be able to help you and your company setup or mature its ITSM services.
*** Note: Additional volumes in the "How I Learned to Stop Worrying and Love my Problems" series are to follow. Please follow me to be alerted to the new additions. ***
Thank you for reading!
John Barber
#JohnBarberITSM
Interesting to see Post Incident Review Engineer” (PIRE) role and key terms RCA and RCI with this. Good article to understand the ITIL better way..thanks John Barber
Interesting to see how you see your work. However does this really add value? My take will be that this singnal a low level of maturty, accros a more diverse org. also change management systems ensures that things are made only when its too late! JIT is a myth, get real?
Anders Damkjær Møller - interesting article 👍
If I have to contribute something to Service now from my experience how can I present it to you..? So that you can change the default settings of Problem management in Service now with new releases..? I can do this for Charity..? if you like it
Excellent article. Of course one cannot generalise, due to cultural differences between companies AND countries. From my experience, French (or France-based) companies often lack of efficient problem management due to: * the quite stupid alignment of problem management activities to incident management ones in the latest ITIL version. As the goal is not the same at all, why use the same macro-activities? * bureaucratic interpretation of ITIL (almost since ITIL debut in France): set up additional meetings with very poor efficiency, but name then Change Advisory Board, Problem Advisory Board, and so on (I repeat: this comes from situations I faced). * religious interpretation of ITIL (this is more common): if it's in the book, let's do it. If it is not, forget it (you cannot imagine how many times I heard "ce n'est pas ITIL". * excessive focusing on tools. Generally speaking, what should be done in Problem Management is to solve problems, not to record each little bit of action in a glorious, universal, painful, easy-to-maintain ITSM tool (and I'll not insist on the Byzantine discussions on how much categories and so on... just let me remind you on of ITIL most important saying: "a fool with a tool...") * bad interaction with other processes (this is an Organisation issue: too much processes, too few common sense). Most problems could be just avoided with: an effective configuration management allowing real impact analysis of incidents but of changes/releases, too; a good event management (intelligent monitoring); a good capacity and availability management (starting at the infrastructure and solution DESIGN). This is obviously a partial view, from my experience.