Root Cause Analysis and Human Errors
Seneca - Errare Humanum Est

Root Cause Analysis and Human Errors

I have had an honor to be a part of a Quality Council in my organization during the last year and some of the recent debates within and outside of my team inspired me to have a hand in this writing about the so-called “human errors” and its variances as it seems to be a topic that causes a lot of confusion on many organizational levels.

There was a cognitive test I remember from my childhood in the Soviet Union that is still quite popular among the youngsters in Russia. First, one was supposed to ask another person “What color is your fridge”? The color was likely to be white as the Soviet industry did not produce it in any other color but white. Then, it was immediately followed by the second question “What do the cows drink”? Being already caught off guard by the first question, an individual’s brains entirely relies on associative thinking and fails to properly process the second one, which results in 5 of 10 answering in all confidence “milk”.

Moral of the story is that however complex and perfect system our brain is, we all can err and the puzzle above is just one of many situations that we can face during our work that can potentially challenge our thinking and result in an undesired outcome we generally call "an error".

In my (aerospace) industry, we place safety first as people lives depend on quality of our deliverables. To ensure that all of the safety concerns are addressed, a lot of attention is paid to preventive activities and root cause analysis (RCA) is something that engineers are having to deal with quite often. I have personally participated in hundreds of RCA activities in different capacities and I have definitely learnt some lessons to share.

Probably the key observation here is that it may even be more customary for people to make errors during the stage of RCA rather than during the work itself. That said, ruining the RCA makes it a redundant procedure that costs money and does not bring value, which undermines the very principles of lean production. For instance, it is very common for people to conclude an RCA with a vague statement of “human factor”, particularly it is quite abundant among those who are new to this, in fact, skill-demanding and complex, process of root cause evaluation. There are I think two main reasons for that. One is that errors' typology remains to be a dark matter for most of the users and the other is kind of an extension of the first one, - it is that people think the job is done when they pointed finger.

Альтернативный текст для этого изображения не предоставлен

Errors' typology is something that most of the individuals do not understand before they embark onto RCA process, yet it is critical to identify during the RCA (and, admittedly, it is intuitively performed on many occasions). The picture below shows a general errors' typology.

Альтернативный текст для этого изображения не предоставлен

Human error can occur at both planning and execution stages. Depending on where they occurred, they can be divided into two major categories: unintended, skill-based errors (failure of execution) and intended errors that are called "mistakes" (failure of planning). It is critical for RCA to distinguish between these two basic types of the errors because of their different premises, that would consequently require different corrective actions. For instance, mistakes are made due to one's inexperience and lack of knowledge, therefore training is considered an adequate corrective action to these types of errors. Unintended errors are also called "human reliability" errors and have two major categories: slips of action and lapses of memory. Slip of action is an unintentional action that was physically done. E.g. pressing the wrong button, missing a process step etc. A memory lapse occurs when an individual forgets to perform a certain step in the plan, their sequence or the plan overall. In case of skill-based errors the employee does have necessary skills, knowledge and experience to perform the task properly, hence a (re-)training will not be an appropriate response to that kind of errors. However, it does not mean that there is no further investigation or no corrective action that can be eventually applied. Since unintended errors are mostly results of routine activities, they can be reduced by application of behavior-shaping constraints such as "poka-yoke" or by use of checklists, workplace design and effective fatigue management. Trust between the employees and their supervisor is critical to mitigate the risks of these errors, since human reliability errors can originate from employees' emotional stress, e.g. some personal issues that would inevitably affect one's focus on a task.

It is a little bit more complicated with mistakes. At the higher level they can be divided into rule-based and knowledge-based. Knowledge-based mistakes normally occur in an unfamiliar situation, under time-pressure where a complex task needs to be resolved in absence of routines and rules. These conditions may lead to development of a solution that does not work properly.

Rule-based mistakes occur when a rule or a set of rules is disregarded. At lower level they can be divided into a few more categories: incorrect application of a good rule, application of a bad rule and failure to apply a good rule. Incorrect application of a good rule can happen when individual is driven by the erroneous assumption that the rule that worked in a previous, ostensibly similar, situation would work in fixing a new issue.

Failure to apply a good rule is also known as violation. Violations can be routine, situational and exceptional. Routine violations are the rules that are casually violated by most of the team members. They may be either a result of poor communication by supervisors when the rules are not simply communicated clearly or too cumbersome process whose violation is known within the group to be tacitly acceptable. Situational violations occur when personnel may believe that the normal rule is no longer safe or appropriate to apply and choose to deviate from the normal procedure. Exceptional violations are quite rare occasions, that are associated with force-majeure events when employees believe that normal rules no longer apply.

Preventing violations takes management understanding of what genuinely motivates employees to make certain decisions. In modern organization one can often hear "Quality shall never be compromised", "Safety first" and so on. However, the other side of the coin is that all quality and safety initiatives increase appraisal component of quality costs and, from that perspective, have direct impact onto revenue. Thus the mottoes may look good "on paper" but the leadership may be frowned upon taking them too literally. In the end, violations are dependent on which path employee thinks is safer to take, that would ultimately shape their attitudes. To ensure the attitude is proper, the leadership really needs to walk the talk and clearly communicate expectations to avoid any ambiguous interpretations. In case of routine violations it may not be out of place for the leadership to scrutinize the process with respect to its rigidity and soundness. "Do all the steps add value?" and "Would following the process steps result in a "rulebook slowdown" effect?" could be the right questions to ask.

Now if we speak about pointing finger, most of the time, it would merely be a fallacy to consider the RCA complete when a person to blame was identified.

In his book "Out of the crisis" Dr. Deming had a following statement:

I should state that in my experience most troubles and most possibilities for improvement add up like this:

  • 94% belongs to the system (responsibility of management)
  • 6% special

Deming continually increased the percentage of errors attributable to the system instead of to special causes such as blaming other people. By no means shall this be taken as an indication that whatever you do, errors will still be found in your process, rather it should be perceived as a hint on where you would normally find the deficiencies to address. All in all, if I am to formulate my major pick ups from the RCA process, I would definitely start with "not blaming individuals". Here's my major pick ups from RCA process:

  • Do not blame people, improve the process - pointing a finger is a very inviting resolution as it seems win-win for both parties: the manager who does not need to perform time-consuming and complex reconsideration of the process, and the performer who, in the worst case scenario, gets away with being called on the carpet for a talk. The leadership should not fall into this trap and it should teach their front-line employees to avoid that either.
  • RCA requires skills, therefore it requires trainings - there is not much to add to this really. 5Whys or a Fishbone diagram is a tool. Before using it, we need to know how to wield it. It would be too naive to expect productive RCA from employees who deal with that for the first time in their lives.
  • Quality is everyone's responsibility, so should RCA - while there is a general consensus on the first part of the sentence, there is still a common notion that RCA shall be performed by some privileged group of experienced engineers (normally supervisors). While a need of competency is uncontested, there is also a need for reconciliation between the two parts of the notion. Apparently, the only way to involve everyone in an organization into RCA is to raise everyone's competences which again resonates with the previous bullet. With this respect, the leadership should also be prepared to hear that the issue occurred at their level and requires their immediate participation or even their primary actions. This is where the companies that foster egalitarian management style may have competitive advantage over the hierarchical organizations.
  • There is never too many Whys - we are all admired by how quickly kids learn. Have you ever thought of what is the basic difference between kids and adults? Curiosity. Kids are comfortable with using "why" as many times as needed to achieve the result (gaining knowledge). They never think they abuse using the word "why". Neither should the adults when it comes to discovering the shortcomings of a process.

The bottom line is that RCA is more than just a tool to correct or prevent an error from occurring. After a process is implemented, RCA is virtually the only effective way to scrutinize the process and tweak it at the earliest occasion, i.e. this is how in essence we put into practice Plan-Do-Check-Act cycle. Also, in absence of the opportunity to see fabrication of the parts in person, RCA is a great way to develop a "design feeling". That said, RCA should be seen by employees as part of the job duties as natural as performing the work itself.

To view or add a comment, sign in

More articles by Alexander Ukhvatov 🇺🇦

  • 7 Sins of 5Whys

    Imagine that you feel sick, you go to the doctor and the doctor instantly prescribes you some medicine. This launches…

    2 Comments
  • Learning the Kaizen Lesson in a Post-Soviet Working Environment

    Working in an international organization, gives one a brilliant opportunity to collaborate with people from different…

    2 Comments
  • Covid-19 and Remote Leadership: Five Major Observations

    I am sure most of us have thought of remote work as of some (desirable) thing from the future. With many virtual…

    1 Comment

Others also viewed

Explore content categories