Thoughts on… MTTR

Thoughts on… MTTR

Being in the business of identifying, quantifying, and measuring all things ITIL, I’ve stumbled across MTTR more than a few times in my career.  I’ve heard several names (Mean Time To Resolve… Repair… Restore… Recover…) MTTR is basically a simple mathematical formula:  an ending date minus a starting date.  There’s not much difficulty in calculating it – but there’s a whole lot of discussion on interpreting it.

So, I’ll start throwing out some observations I’ve made over time.  I’m not saying that all these are applicable to every situation and I’m not saying that this is a finite list, but these are concepts that deserve discussion.

Concept1:  Why is it that, for every reporting period, MTTR always changes?  Why can’t it be constant over time?

A1:  Just about all of the MTTR charts I’ve ever seen represent MTTR by submit month.   I use this technique, too.  But, inevitably some eagle-eyed manager points out that MTTR for the current reporting period is different than last reporting period – why?  So I had to explain that not all tickets were resolved during that reporting period. Some tickets are resolved much later, so when the upcoming reporting period is finally calculated, any historical MTTR will now be higher.  I’m not saying that charting MTTR by submit month is wrong, but if that is the accepted method for your company and provided there are still opened tickets, MTTR will always increase over time. 

Concept2:  So, is there a best way to represent MTTR? 

A2:  Knowing that there will always be opened tickets by submit month, I turn the typical single axis MTTR chart into a dual axis chart by adding “percentage of tickets still open per ticket submit month”.  Why is this important? 

There have been many times over the years where I have seen a fantastic downhill slope for MTTR, but further analysis revealed that the aging ticket population was increasing during the same time.  A high percentage of open tickets can indicate procedural issues that need to be further examined.  (I have to note that percentages are great for larger populations, but they can be very misleading with smaller populations. I often use a stacked bar chart of submitted tickets over still open tickets for small collections of data.) In the following chart, it is easy to see that while February 2015 has an incredibly low MTTR, 20% of those tickets are still open.  When those tickets are finally resolved, the MTTR for that month will skyrocket.  But, overall, you can tell that this department has implemented better controls.  : ^ ) 

Below is a nice downhill slope with a low open percentage.  The MTTR uptick in November and December is a seasonal expectation.  The uptick in the open percentage for April is also typical, for this data was collected mid-month of May.  There will always be open tickets in the last month of this type of chart.

Concept3:  So why do aging tickets matter?  My customers are happy – I have great customer survey reviews!

A3:  Great customer surveys are always welcome, but a customer survey is generally initiated when the ticket is resolved.  If you have a low open percentage, then chances are your customer surveys are pretty recent and indicative of current activities. I would, however, compare the number of tickets resolved (or number of surveys sent) to the number of surveys received to see if the rate of return gives you the confidence to say that you truly have surveys indicative of the work your team performs.

To think about:  What happens when your team resolves a year old ticket?  A survey will get kicked off (should surveys older than a couple of months even be generated?), but… is the ticket customer even still at the company…and if so, will he/she remember that ticket… and if they remember that ticket, what are they currently thinking about your team as they either trash the survey or… fill it out?  What does an excessive queue of aging tickets say about existing ticket handling procedures?  Staffing?  Management?  Ticket management is a soup-to-nuts life cycle – there’s a beginning and there’s an end, and everything is measurable.

The below data grid is a typical aging report with color coded aging buckets.  Note that the ticket numbers are highlighted in red if the ticket is in the >90 day aging bucket.

Concept4:  My team resolves tickets a heck of a lot faster than the numbers you are showing me.  Your numbers are just plain wrong!

A4:  Well, technically… no… the numbers are not wrong.  But there is an underlying story that needs telling.  You just need to dig a little bit.  There are two charts that tell a pretty decent story.  The first one is based upon a non-symmetrical distribution of ticket count by MTTR time bucket, and the other I’m calling the 10/90 Split, based upon sorted and ranked MTTR (I worked on this last one with a senior level executive for year-end analysis.) 

Statistically speaking (for the first chart), I never expect MTTR to be a bell curve.  Since the goal is to reduce MTTR, the majority of the population should reside on the left side of the chart anyway… the ideal chart will be skewed to the left with a tail on the right.  That means the far right of the chart should contain a small population with the biggest calculations of MTTR and the left side will contain a large population and with the smallest calculations of MTTR.  It makes sense that the tickets with the biggest MTTR calculations will increase the overall MTTR number – even with a small amount of tickets.  In this chart, you can see that 73.3% of the ticket population has an average MTTR well below the total population Average MTTR.  The average “under” MTTR is 2.49 as compared to the total population Average MTTR of 21.62.  And it’s easy to see that the minority ticket population (687 tickets in the “over” population) cause the greatest increase to the Average MTTR.

Here’s another way to analyze abnormal behavior.  Instead of calculating the MTTR under and over the Average MTTR, what happens if you calculate MTTR for the top 10% “bad guys” and compare that to the MTTR for the bottom 90% “good guys”?  First, you must rank the ticket population by decreasing MTTR.  Then split the population by a percentage, say… 25/75, 20/80 or 10/90… whatever break point you want to analyze.  Is there anything in common with the “bad guys”?  A common group?... a common assignee?... a common service?  When you start to focus on and manage the top offenders, this is what can happen.

The concern of the manager who strongly felt that the MTTR for his/her group should be much, much lower… is right… in the majority of the time.  The problem is that lousy top population.  It’s not the entire MTTR population that needs to be managed… it’s only the top population.  The bottom population is basically taking care of itself. 

Concept5:  My team doesn’t receive a lot tickets, but we certainly close a lot… and our MTTR is terrible through no fault of our own.

A5:  Transferred tickets often (but not always!) have the highest MTTR calculations.  In the vast majority of the time, the more tickets are passed around, the more time is accumulated and the last team to handle the ticket gets dinged for the big MTTR calculation.  To address this concern, I often break down MTTR analysis into three categories:  traditional, FCR and non-FCR. 

  • Traditional – MTTR is assigned to the last group who worked on the ticket. This calculation assumes that MTTR is customer facing.  The customer probably doesn’t care how many teams work on the ticket as long as the ticket is resolved in a timely manner.
  • First Contact Resolution – this is not First Call Resolution (pertaining to the phone system), but relates to the first group (sometimes called First Group Resolution) who receives the ticket and works that ticket through to completion. A common FCR definition is that there are no group transfers, might be intragroup transfers to different individuals, and usually involves only one resolution status.  If there are multiple resolutions, but only one group working the ticket, I call this a Modified FCR ticket.
  • Non-First Contact Resolution – as the name implies, any number of groups may contribute to the resolution of the ticket.

Quick Interpretations of the below chart:

  • Group1 - 67% of tickets they closed were considered FCR and the associated MTTR was less than half of their non-FCR tickets.
  • Group2 – must be a shared services, or Level 2 or 3 group because almost all of their tickets are non-FCR.
  • Group3 – it’s not always true that transfers cause increases in MTTR. Sometimes, the increase is attributed to an individual… who hates to resolves tickets… on vaca… who left the company… is overloaded, etc.

When analyzing the differences between FCR and non-FCR always be cognizant that FCR will focus on relationships and activity within the one group and non-FCR will focus on intergroup relationships and work activities.

(In the below data grid, the counts for FCR and non-FCR tally up to the Traditional count.)

Concept6:  Gaming the system to reduce MTTR. 

A6:  Oh yeah… people know when they are being measured, and self-preservation kicks in.  Behaviors (not all inclusive!) to watch for:

  • Negative:
    • If there is an increase in aging tickets – check to see if only quick resolution tickets are closed and all the really hard ones are left open. MTTR will be small… for the moment. 
    • Look for groups, or individuals, who “mass enter and mass close” all their tickets at the end of their shift. Definitely analyze for training and resource issues.
    • Look for groups, or individuals, who use the ticketing system as a daily diary.
    • Look for an influx of automation tickets that were generated with very low threshold parameters.
  • Positive:
    • Look for an influx of automation tickets that were designed to “move work down the stack”.
    • Look for an increase in FCR tickets within certain key groups – not every group!
    • Look for a decreasing slope in “Managing the Top 10% Bad Guys”. This will show an improvement in Incident Management maturity.  Once the “Top 10% Bad Guys” plateaus down, you will be entering another phase in maturity and will need to adjust your analytical thinking to that new level.

MTTR, by itself, is just a number to me and doesn’t mean very much.  MTTR is a microsystem of interconnected components.  However, not every manager understands these components, and simply judge a group’s performance on that one single number.  Every month, MTTR is religiously measured and compared to hopefully identify a trending decrease.  The effectiveness and efficiency of a group is often equated with a low MTTR, but that is not a guaranteed correlation. MTTR is simply a player in the MTTR microsystem… a world that includes Aging, Active and Assigned analysis.   When used in conjunction with other resolution data elements, MTTR can, indeed, paint a powerful and revealing picture. 

 

Well said Beth! Oh boy, the memories this brings back! Hope all is well with you cuz!

Like
Reply

To view or add a comment, sign in

More articles by Beth Carpenter

  • Data Wrangling

    Sometime back, a coworker asked me for some data. I asked her what type of data she needed… what she was trying to…

    1 Comment
  • Thoughts on... Understanding Incident Data

    A couple of decades ago, when I first started analyzing application and process data, the newly minted word “data…

    5 Comments

Others also viewed

Explore content categories