Evaluating The Effectiveness of Training Programs

Evaluating The Effectiveness of Training Programs

Last week I had an interesting conversation with someone who asked me if we do Kirkpatrick-style evaluation in our training programs. It had been a long time since I even heard that phrase (let alone thought about its application) so I had to think about it for a minute before answering. I haven’t thought about it for a long time because it never comes up when discussing training in the trucking industry – I think this was the first time anyone in trucking had ever asked me about it. It got me thinking, though, so I figured I should write about it!

A Kirkpatrick Primer

Kirkpatrick-style evaluation refers to Donald Kirkpatrick’s highly influential model for evaluating the effectiveness of training programs. It’s also known as the Four Level model since there are four distinct types of evaluation that happen at different times. The Four Levels are:

  • Level 1 – Reaction – what people thought or felt about the training, while it was happening or right at the end. This is basically a training smile sheet
  • Level 2 – Learning – what people actually learned, as evidenced by a test or knowledge demonstration of some sort. This is often the final test at the end of a course.
  • Level 3 – Behavior – what changed in their job performance as a result of the training, usually measured a few months afterward through observation.
  • Level 4 – Results – what the final result was in terms of measurable improvements for the business.

Applying all four levels to a training program adds a level of rigor that helps to ensure business objectives are met, and clarifies the overall value for the business. Levels 3 and 4 really only work if you have a good set of baseline data to start with, but if you can put it all together then you develop a really nice picture of how well the programs are working.

Or, at least, that’s the theory.

Trucking and the Four Levels

Kirkpatrick’s Four Levels have been an accepted standard in the adult learning world for decades, but I don’t think they fit into trucking as nicely as they fit elsewhere.

In some industries, the process of evaluating all four levels is pretty straightforward. For instance, if you’re teaching people how to use some new software system, you can easily do Levels 1 and 2 during the training, check back 3 months later to see if people are still using the software properly, then start calculating the business benefits derived from that usage. Other examples include soft-skills training like sales or negotiation techniques, or organizational things like project management  – they can all be measured pretty nicely with the Kirkpatrick model.

However, trucking, and ongoing training for drivers in particular, is a bit different. You can certainly do the first two levels like any other industry – we encourage fleets to talk to their drivers about training experiences (L1) and we test what people learned (L2) directly in our courses. It’s the next two that are tough – observing behavioral change several months later and confirming measurable improvement.

The problem with L3 and L4 when it comes to drivers is that there are many other variables beyond the training that may influence their actions. The observed behavior, and the presence or absence of tangible business results, may be a reflection on the training or it may have almost nothing to do with it.

As an example, let’s say a group of drivers complete a course on vehicle inspection, pass the final test, and provide positive feedback about the experience. Solid L1 and L2 indications that the training is effective. However, 3 months later those drivers are observed to be completing inspections improperly, and the overall rate of violations hasn’t decreased in this area. Does that mean the training was ineffective? Possibly. It could also be that the drivers have job satisfaction issues and they’re unmotivated, or they’re facing other pressures and cut corners.

Conversely, 3-6 months after the training the violations may have improved considerably. Does that prove the training worked? Maybe, but it could also be that there were fewer inspections, or more lenient inspectors, or newer equipment, or a host of other things that could have just as much impact on the results.

This is where I see a big difference between trucking and other industries. There are significant job satisfaction issues in the workplace here, and TONS of outside variables that can get in the way of drivers doing their jobs well. Some companies have a business structure that makes it easy for drivers to build on the benefits of training and continuously improve their job performance. Other companies are the exact opposite – with challenging work, tough schedules, and fussy customers all picking away at a driver’s ability to really make the most of any particular training program. Both types of company could use the same training program but have vastly different results purely because of their respective environments.

I’m not suggesting that critically evaluating the effectiveness of training programs is a waste of time, but I think it needs to be done carefully so that all the variables are considered. To be useful, a Level 3 evaluation (observing the behavior a few months after the training) also needs to consider the overall satisfaction level of the driver, what else may have changed in their job or typical work schedule over that time, and anything else that can influence their actions. The same goes for Level 4 – since there isn’t a direct and exclusive relationship between training and business outcome, it’s not enough to just look at the result to determine the effectiveness. A broader base of participant data is required, and all the related variables need to be factored in as well.

As I noted above, the Kirkpatrick model is accepted as the gold standard in the adult learning world, but it’s insufficient for many driver training situations. It’s a great start, but any meaningful evaluation of the longterm effectiveness of a training program in the trucking industry needs to consider a whole range of variables if it’s going to be a reliable foundation for decisionmaking.



To view or add a comment, sign in

More articles by Mark Murrell

  • Blue Ocean Conference: Why I’m Excited for the Best Fleets Event Next Month

    In 2004, W. Chan Kim and Renée Mauborgne published Blue Ocean Strategy, outlining how a series of disruptive businesses…

    1 Comment
  • Thoughts on the Rogers Outage

    As we approach 2 weeks since Rogers had a major network outage that hobbled a third of the country for an entire day, I…

    1 Comment
  • “How do we know they took the training?”

    As a continuation of past articles looking at some myths related to driver training (here and here), and the problems…

    1 Comment
  • “Drivers Don’t Want Training”

    The idea that drivers don’t want training seems to be pretty well cemented in the minds of many safety people. When I’m…

    2 Comments
  • “What if we train our drivers and they leave?”

    I'm working on a series of articles that reviews the assumptions and myths about drivers and how damaging they can be…

    3 Comments
  • When Data Misleads Us

    “When a measure becomes a target, it ceases to be a good measure” ~ Goodhart’s Law (as paraphrased by Marilyn…

    2 Comments
  • Best Fleets 2021: Measuring Manager Performance

    It’s a competitive world Everything counts in large amounts ~ Depeche Mode Every year when we get ready to score the…

  • Best Fleets 2021: The Year in Review

    With last week’s announcement by the TCA that they’re moving their convention from April to September, several Best…

  • Safety, Beyond the Truck

    “Safety comes first here” “We’re a safety-focused company” "Safety is at the heart of what we do" I’ve heard these…

  • What It Takes To Be A Best Fleet

    Two weeks ago we unveiled this year’s Top 20 Best Fleets to Drive For. The announcement was followed by the usual…

Others also viewed

Explore content categories