Basic Elements of a Comprehensive Root Cause Analysis Investigation
3 Steps and 3 Tools that Organize and Improve Your Problem Solving Capability
By Mark Galley
The terms failure analysis, incident investigation, and root cause analysis are used by organizations when referring to their problem solving approach. Regardless of what it’s called there are three basic questions to every investigation:
- What’s the problem(s)?
- Why did it happen (the causes)?
- What specifically should be done to prevent it
These questions provide the framework for all information collection. There are also three additional tools that help organize all the pieces of information. They are a timeline, diagrams/photos and the process maps. A comprehensive investigation requires the collection of all relevant information documented in a clear and coherent format.
This paper provides a basic explanation of each steps and each tool. The objective is to simplify and improve the way individuals and groups investigate and solve problems. First, we’ll cover two important points that apply to every aspect of an investigation: the importance of focusing on principles and being specific in the communication.
Principles are constants. They do not change from problem to problem. Likewise, the cause-and-effect principle is fundamental to all problems. It doesn’t change from one problem to the next. The cause-and-effect principle can be universally applied to equipment failures, supply chain problems, production outages, customer service issues and people problems. By focusing on the principle of cause-and-effect an organization can develop a consistent approach to investigate and solve all problems.
There is a truth to any incident that has already happened. The layout of a town is the truth. Creating a map of that town is truly an objective exercise because the roads already exist in a particular way. The map should match the actual layout of the town, just as the investigation should match the incident that occurred.
Many people think of cause-and-effect as a linear relationship, where an effect has a cause. In fact, cause-and-effect relationships connect based on the principle of a system. A system has parts just like an effect has causes. The equipment downtime was because a part failed. We find that the part failed because of fatigue. The next question is “Why did it fatigue?” and the why questions can keep going. Most organizations mistakenly believe that an investigation is about finding the one cause - or “root cause.” An effect doesn’t have one cause, an effect has causes. The causes reveal different ways that the problem can be solved.
The word analysis means to break down into parts. Failure analysis, problem analysis and root cause analysis all start with a problem which is then broken down into its parts. The parts of a problem are the causes. The more severe the incident the more detail that is added to the investigation.
A common mistake that organizations make in investigations is the tendency to categorize an entire incident into one cause. As an incident is broken down into detail, more and more causes are revealed. Understanding these detailed causes reveals additional ways that the problem could possibly be solved. As the causes get more specific the solutions also get more specific. Problems are not solved in general. Problems are solved when specific action is taken. “The devil is in the details.”
Organizations may try to group an entire investigation into one category. This makes the incident more general, not more specific. The five favorite generalizations organizations mistakenly use are human error, procedure not followed, equipment failure, training inadequate and design. Many groups believe that the end of an investigation has been reached if they can get to one of these five categories. Don’t stop too early – ask two or three more why questions to get more specific information.