Root Cause Analysis - Incident Investigation

Lessons from Three Mile Island

August 26, 2009 Kim Smiley

The partial meltdown of a core at the nuclear power plant at Three Mile Island is one of the most well known engineering disasters in US history. Luckily, no one was injured and there was no significant environmental impact, but the potential for major issues was very real. Three Mile Island also had a huge impact on the nuclear industry and required a major clean up effort.

Performing a root cause analysis of historical incidents is useful because there are a number of lessons learned that can often be applied across a variety of industries.

As is true with any complex system, there were many causes that contributed to the Three Mile Island incident. At the most simplified level, cooling water flow was stopped to the primary system (the nuclear portion). The primary system then started to heat up, increasing the pressure to the point that a relief valve lifted. The relief valve then failed to reseat and a large volume of coolant was lost. The core eventually overheated because it was uncovered due to the loss of coolant.

Another factor that contributed significantly to the Three Mile Island incident was operator action during the casualty, which occurred over several shifts. Had operators been able to understand the status of the plant in a timelier manner, the plant could have been put into a safe condition.

At first glance, it’s easy to stop at this point and use a term like “operator error”, but a thorough analysis requires more digging. Even if the technology being considered is radically different than a nuclear power plant, there are many lessons that can be learned from studying how the control room design impacted the operator actions during the incident.

The design of the control room significantly contributed to the operators’ inability to identify plant conditions. The control room was huge with hundred of instruments to monitor, some of which were on the back of the control panels and couldn’t be viewed in the normal watch standing locations. Dozens of alarms, both audible and flashing lights, went off in a very short period of time without any obvious priority. The alarms continued throughout the casualty and the sheer volume of information was nearly impossible to interpret accurately.

Many industries continue to benefit from the lessons learned from the design of the control room.

For more detailed information on the Three Mile Island accident, please see the NRC’s Three Mile Island fact sheet.

Root Cause Analysis - Incident Investigation

When the Power Goes Out . . .

August 21, 2009 ThinkReliability Staff 1 Comment

Basing Contingency Plans on the Impacts to your Organization’s Goals

By ThinkReliability Staff

An excellent discussion resulted as part of our free Webinar series last week. An attendee asked the question “What if there’s a cause you can’t control, like the weather?” So another question was raised; “How do you prepare for those sorts of things?”

You can prepare for potential problems that may arise by using a Cause Map, just like you would after an actual problem occurred. We call the Cause Map of things that COULD happen a “proactive” Cause Map, while a Cause Map of something that DID happen is a “reactive” Cause Map. Typically you will see reactive Cause Maps, but a proactive analysis can be extremely useful for contingency planning, as well as to develop problem-solving skills.

To create a proactive (or COULD) Cause Map, follow the same steps normally used in a root cause investigation, trying to imagine the possibilities for impacts to the organization’s goals. Then create the Cause Map and determine possible solutions (action items). The “cost” of the impacts to the goals will depend which solutions are reasonable to implement.

As an example, let’s look at a power outage from the perspective of a hospital. (View The Joint Commission’s Sentinel Event Alert on power outage.) A power outage could lead to the deaths of patients, resulting in an impact to the safety goal. It could lead to the loss of life-saving equipment, resulting in an impact to the customer service goal. It could cause the facility to not be able to admit new patients, resulting in an impact to the production goal. And, it can result in material and labor costs resulting from the transfer of patients to another facility.

Beginning with these impacts to the goals, we can create a Cause Map. (The Outline and Cause Map are shown on the downloadable PDF.) All the impacts to the goals lead back to a loss of electrical power, caused by both a power outage AND a lack of back-up electricity source.

When determining solutions, there are a few that come to mind, including transferring patients to another healthcare facility (which itself becomes an impact to the goals) and installing battery backups in equipment. However, because of the severe impacts to the goals, a hospital will likely decide that the whole problem can be solved by installing an emergency generator. Problem solved. However, is installing an emergency generator always the right contingency plan for a power outage?

Let’s look at the same situation from the perspective of an office building. A power outage could cause some employees to get injured as they’re exiting the building, resulting in an impact to the safety goals. It will result in the loss of the business function of the office, resulting in an impact to the customer service and production goals. It may also result in paying employees for a non-work day, which is an impact to the labor goal.

The Cause Map looks similar to the hospital power outage Cause Map in that all the impacts lead back to a loss of electrical power, caused by a power outage and lack of back-up electricity source. So, we could put in an emergency generator just like the hospital did and have our problem solved. But the effort and capital required to install an emergency generator based on the lesser impacts to the goals is probably not worth it. Instead, some of the less expensive and consuming solutions can be implemented, such as installing emergency lights and setting up remote work stations for employees.

View the Outlines and Cause Maps for both the hospital and office building power outages by clicking “Download PDF” above.

Root Cause Analysis - Incident Investigation

Midair Aircraft/Helicopter Collision Over Hudson River

August 13, 2009 ThinkReliability Staff

By ThinkReliability Staff

On August 8, 2009, a small airplane clipped the wing of a sightseeing helicopter and both aircraft crashed into the Hudson River, killing all nine people. The crowded corridor above the Hudson River was also the site of the successful crash landing of U.S. Airways Flight 1549 in January, 2009. The evidence from the crash is still being recovered from the accident site, so the investigation is ongoing. However, just because we don’t have all the causes doesn’t mean we can’t start our root cause analysis.

A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page. To begin, we define the problem in an outline. So far, we know the date and approximate time of the collision. (We may be able to refine the time of the accident as more information is released.) We know the location of the collision based on eyewitness accounts and the discovery of wreckage. We also know the type of plane and helicopter involved, and what they were doing (the plane was in transit to Ocean City; the helicopter was on a sightseeing tour).

Next we define the problem with respect to the impact to the goals. The safety goal was impacted because nine people were killed. Both the airplane and helicopter were lost (or at the very least, severely damaged), which is an impact to the material goal. Lastly, if we have the information, we can record the frequency of this type of incidents. The last helicopter/airplane collision in the New York City area was in 1983.

Once we’ve completed the outline, we can move on to the Cause Map. We begin with the impacts to the goals and fill in the Cause Map by asking “Why” questions. Both goals were impacted because the plane and helicopter crashed into the water. We continue to ask “Why” questions. Both aircraft fell into the water because the plane clipped the helicopter’s wing. The pilot clipped the helicopter’s wing because the plane and the helicopter were in the same airspace. And, it’s surmised that the pilot could not see the helicopter. (We don’t have any solid evidence supporting this yet, so we’ll leave a question mark.)

The plane and the helicopter were in the same airspace because the area is crowded with sightseeing helicopters and small planes which are prohibited from flying above buildings or over 1,100 feet. Around New York City, that pretty much leaves the river. Pilots who are flying below 1,100 feet are free to choose their own route, and are not under the control of air traffic controllers. Instead, they use the “see and avoid” method.

Unfortunately that method isn’t successful when a pilot can’t see an incoming helicopter. Although small planes are not controlled by air traffic controllers, they are in communication with them. However, the pilot of the plane had never contacted the Newark controllers. The helicopter was ascending at the time of the crash, so it’s likely that it came from below the plane (where the pilot would be unable to see it). The helicopter may have been unaware of the plane because it’s not required (though it is recommended) for pilots to announce their position.

As the NTSB investigation continues, more detail can be added to this Cause Map… As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.

Root Cause Analysis - Incident Investigation

Saving Sharks from Extinction

August 6, 2009 ThinkReliability Staff

By ThinkReliability Staff

In honor of the Discovery Channel’s “Shark Week”, we’ll use the problem of shark species at risk of extinction as an root cause analysis example. We’ll begin by building a Cause Map, which is a visual method of performing a root cause analysis.

We begin a root cause analysis with an impact to the goal. Shark species being at risk for extinction is an impact to the environmental goals. While I didn’t add this kind of detail, evidence has shown that a decrease in the number of sharks results in problems for the rest of the food chain.

We fill out the Cause Map by asking “Why” questions. Shark species are at a risk of extinction because the death rates of sharks are higher than the birth rates. Sharks have low reproductive rates (they mature slowly, have long gestational periods, and birth few young), and increasing death rates. The increasing death rate is due to over fishing (fishing without regard to population), injured sharks being left to die, and loss of habitat, caused by pollution. The combination of sharks being fished for sport, food, or products (which are rising in value; sharks are thought to cure cancer) and the lack of effective regulation has led to over fishing. Many sharks are injured, either as “bycatch” meaning sharks are brought up in fishing nets while fishing for something else, or by a practice known as “finning”, where a shark’s fin is cut off. (Shark fin soup is very popular.) In both cases, sharks are typically thrown back into the water injured and left to die. Many countries have a ban on finning, but the ban is not always effectively enforced.

Many countries around the world are trying to protect sharks. Some of the solutions they have implemented are to create shark fishing quotas, increase enforcement of fishing quotas and finning bans, decrease the market for shark products and shark fin soup, and limiting any fishing in known shark habitats. Solutions can be shown on the Cause Map, directly above the cause they control. Once solutions have been selected for implementation, as these have been, they are listed in the Action Items list. (To see the Cause Map and Action Items list, click on “Download PDF” above.)

Your Expert Root Cause Analysis Resource

Monthly Archives: August 2009

Lessons from Three Mile Island

When the Power Goes Out . . .

Midair Aircraft/Helicopter Collision Over Hudson River

Saving Sharks from Extinction