The Problem Resolution Process
The Problem Resolution Process – D M Goldstein, Feb 2022
Long ago I worked for a VP who decided that our Support organization was going to be ISO 9000 certified. My first thought was that Support was an art and therefore could not be reduced to a flow chart and standardized steps. But I was wrong. In fact, what I learned was that from a high level there is a universal set of steps to resolve any problem. As with my other articles, this has a focus on Enterprise Software Support, but the concepts are applicable way beyond that scope.
What is the Problem Resolution Process (PRP)? When applied as a corporate process it provides a common language across teams to describe the state of an issue and assigns ownership of specific tasks among the players. Not all issues will require explicitly going through every step, but they implicitly flow through them as part of a generalized life cycle. The PRP provides a set of guidelines that ensure that important steps are not overlooked.
PRP: Basic 6-step process
Characterize: The first step in resolving any problem is to characterize it. This requires answering some basic questions:
· What is happening or not happening?
· What is the expected behavior?
· What, if anything, was recently changed, and when?
This could be as simple as, “My toothpick broke.” In a medical situation it could be, “Doctor, it hurts when I do this.” Or it could be a serious technical or business problem. For a software problem this may require collecting relevant information: problem description, product version, environment, observed behavior, expected behavior, steps to reproduce, users affected, error messages, log files, screen shots, etc.
Diagnose: The next step is to drill down into the problem to gain a clear understanding of the details. For a simple problem like the toothpick, this is your mental affirmation, “Yup, it’s useless.” For the medical situation it’s the doctor doing the question-poke-and-prod routine to understand more about the issue, or diagnostic efforts like lab work, x-rays, and MRIs. In a software support situation this means verifying the steps to reproduce, analyzing error messages, and determining the frequency and impact of the problem. It is applying a methodical thought process to understand the nature of the problem – its impact, breadth, and behavior.
Research: Before you can resolve a problem, you must first determine whether it actually is a problem. If so, are there already known ways to fix it? This step may simply be implicit if it’s an issue with which you are already familiar and know the answers, like with the toothpick (no research necessary) or even the medical situation for an obvious and common problem. In the software world this is verifying the expected behavior (docs), researching known issues (bug database) and previous customer incidents (case tracking system), looking for known workarounds (knowledge base), and developing a workaround if possible.
Reproduce: If you can’t reproduce the problem, how will you know whether you’ve solved it, or even fully understood it? For software you would attempt to reproduce issue in a lab or customer environment, and design or verify a workaround if possible.
Resolve: This could be providing prescriptions or a treatment plan for my medical example, or a new toothpick for that example. For software, it is providing the resolution: answer a how-to question; provide information on expected behavior; provide a workaround to a product or business issue; release a code fix to address a product bug or feature.
Recommended by LinkedIn
Verify: This is a critical step that is often overlooked – did you solve the problem? Did the new toothpick suffice? Did the medical treatment work? Verify whether the software solution provided addresses the customer’s need; ensure proper documentation of the issue and solution; in the case of changes to code or configuration, ensure change management control.
Assigning Ownership
You may have noticed in the flow charts above that I have assigned the six phases to specific “Tiers”. In the Software Support world, “Tier 1” typically refers to front-line Support resources. These are the people who take first pass at inbound issues, and typically resolve a very high percentage of them on their own. “Tier 2” are backline or senior Support engineers. They are usually subject matter experts (“SMEs”) on aspects of the product being supported. They are expected to have strong diagnostic skills and a deep technical knowledge of the product or environment.
“Tier 3” is an engineering resource capable of making code changes. This is a simplification for the purposes of this discussion and may comprise multiple teams. This could include a “rapid repair” team within Support, a team within Product Engineering specifically created to interface with Support and/or provide emergency patches to the software, or it could be the application development teams themselves. The application developers are sometimes referred to as “Tier 4”.
Simplifying the case flow, it looks like this:
· Tier 1 receives the case.
· If they can resolve it, they annotate and close the case.
· If not, then Tier 1 escalates to Tier 2 (or to another appropriate resource), either by reassigning the original case or creating a new related case, depending on company policy.
· Tier 2 will either advise Tier 1 and return the case to them or resolve and document the case.
· If Tier 2 is unable to do either, they escalate to Tier 3, typically by creating a related case in Engineering’s bug tracking system. Typically Support (Tier 1 or Tier 2) retains ownership of customer communications on issues escalated to Engineering.
· Tier 3 resolves the issue or follows Engineering-specific processes to assign and track the issue.
Companies should have a hand-off process which defines how each Tier interfaces with the others, and what information needs to be provided during resolution, escalation, and de-escalation of cases.
Note that most companies reserve the right to fix bugs “at their discretion”, so not all issues which are escalated to Engineering result in actual fixes, or not within a short timeframe.
Most companies also have internal and external Service Level Objectives or Service Level Agreements (collectively “SLAs”) around resolution timeframes based on an issue’s priority, severity, frequency, and age. Issues causing severe impact or impacting multiple customers get prioritized ahead of low-impact or cosmetic issues. “Priority” is the business impact. “Severity” is the technical impact (service unavailable, data corruption, minor bug, or cosmetic issue). “Frequency” is whether the problem is persistent or intermittent, or a factor of how many different customers are impacted.
The next time you are faced with a problem to solve, think about how you implicitly or explicitly execute these steps while resolving it.
Excellent article of the PRP process Miles. I love your stuff! I’m a user of a software product for my boutique fitness business. I provide (unofficial) support to other users of the software through a Facebook user group (and to my user’s at my fitness studio) and I follow these steps! Obviously, my escalation is to the software provider, but I’m an early adopter and well known at the company so my issues usually get addressed by 3rd-tier support agents, as they know I’ve already performed tiers 1 & 2 myself. I’m still a technical support agent at heart 💜 Miss you my friend