Tag Archives: root cause analysis

Deadly Superstorm Slams the US

By Kim Smiley

Hurricane Sandy pummeled the Mid-Atlantic region of the United States on October 29th  and 30th, 2012, leaving more than eight million people without electricity, causing massive flooding and killing over 110 people.  The damage done by this storm was massive and economic impacts have been predicted to be as high as 50 billion dollars.

Why was Sandy so devastating?  This question can be answered by building a Cause Map, an intuitive format for performing a root cause analysis.  A Cause Map is a useful tool for breaking down this complicated issue and can help explain why this storm was unique.

In this example, there are a number of things that combined to made Sandy a unique and especially dangerous storm.  First off, Sandy wasn’t just a normal hurricane.  As hurricane Sandy moved to the north it converged with other weather systems turning into a hybrid storm.  This hybrid storm brought with it a combination of extreme summer weather (strong winds with heavy rains) and winter weather (cold temperatures and snow).  Unusual timing of the different weather systems helped this superstorm form.  Hurricane Sandy hit very late in hurricane system and cold air sweeping down from Canada was colder than typical for this time of year, a combination that proved deadly.  The nature of these converging weather patterns also made Sandy a very slow moving storm so that areas experienced higher rain fall and more damage than they might have with a faster moving system.

Normal hurricanes are powered by the warm, moist tropical air and weaken as they travel north.  They also typically turn to the right and head out to sea.  When Sandy converged with the other systems it became an extra-tropical cyclone and actually strengthened as it hit shore.  The effects of these other weather systems also turned the storm left onto land and it took an unusual path over some of the most heavily populated areas of the US, including NYC,  intensifying the impact of the storm.

The timing of Sandy also impacted the peak flood levels.  Sandy hit during a full moon when tides are at the highest point of the month.  During a full moon, the effects of the moon’s gravity are felt the strongest so tides are higher.  The high winds created by Sandy combined with the full moon resulted in a massive storm surge.

Sandy truly was a Superstorm.  Weather systems that normally don’t exist at the same time converged to create a massive storm that moved in a usual path over one of the most heavily populated regions in the US.  And the storm hit at the worst time of the month for flooding.

For more information click here or here.  To view a high level Cause Map of this issue, click “Download PDF” above.

 

The Comet That Couldn’t Fly

By ThinkReliability Staff

“… the most exhaustively tested airplane in history.”

-Expert opinion on the DeHavilland Comet

Today, commercial jet air travel is standard fare. Estimates for the amount of air traffic over the United States in a given day have been in the range of 87,000 flights. With clever planning, clear skies and smooth service, a citizen almost anywhere in the world can get anywhere else by plane in less than 24 hours. But looking back at the history of aviation show us how far safety has come. Consider the DeHavilland Comet, the first commercial jet to reach production. British aviation specialists finalized the Comet’s design with much excitement in 1945 in hopes it would position their industry to establish a revolutionary service in commercial jet flight. Unfortunately, the Comet crashed on January 10th and April 8th in 1954.

What happened? We can identify some of the causes in a Cause Map, or visual root cause analysis.

CAUSE #1: POOR TESTING When you test an extremely heavy object carrying hundreds of people at high speeds thousands of feet above the ground, you would think planning for the worst case scenario would make the most sense. Unfortunately, the Comet tests were performed in tainted conditions on the strongest part of the plane.

Add in the fact that there was no prototype for the plane and you’ve got a test not worth having… and a plane not worth flying.

CAUSE #2: UNEXPECTED PRESSURE Altitude leads to pressure, and pressure puts stress on planes. But this stress wasn’t evenly distributed, and certain parts of the planes’ bodies were unevenly affected. So rather than the expected amount of pressure on the planes, the Comets faced an unforeseen squeeze.  

CAUSE #3: FLYING ABOVE AND BEYOND The Comet flew at twice the speed, height and cabin pressure of any previous aircraft, displaying a rather dangerous amount of ambition.

Combine all of this, Cause Map it, and you’ve got a plane flying under incredible conditions it couldn’t withstand, facing high pressure where it was most vulnerable.

In other words, an airborne recipe for disaster.

FALLOUT #1) SAFETY As expected, the pressure cycle in the planes’ cabins cracked the bodies of the planes. When the planes broke up, the lives of 56 passengers and crew members were lost.

#2) CUSTOMER SERVICE Some British industry institutions have a highly prestigious reputation (the Royal Navy’s impact on British sea travel comes to mind). The loss of the aircraft, though, was a black eye on British Aviation. Aviation historian George Bibel called the Comet an “adventurous step forward and a supreme tragedy.”

#3) MATERIALS/LABOR Effective airplanes have never been cheap, and this was no different. Not only would it cost money to investigate the cause of the accidents, but to replace the airplanes.   

FUTURE SOLUTION The Comet’s tragic crash had one silver lining: the post-crash analysis performed by its designers (including Sir Geoffrey de Havilland) set the precedent for future air accident investigations. In fact, the Comet was redesigned to solve the issues that caused the crashes and would later fly successfully. But by then, Boeing had already taken over most of the commercial jet market.

In the end, the Comet was first in flight but last in the market.

See more aviation cause maps:

Want us to cause map a specific plane crash for you? Tell us in the comments and we’ll pilot our way through it.

Rogue Ocean Fertilization Experiment Done

By Kim Smiley

An entrepreneur created a massive bloom of plankton after he dumped a hundred tons of iron dust into the Pacific Ocean off western Canada last June.  This action has sparked outrage because an individual manipulated the environment without government approval or scientific oversight.

A Cause Map, a visual format for performing a root cause analysis, can be built to analyze this issue.  The first step in building a Cause Map is to determine how the issue impacts the overall goals.  The next step is to ask “why” questions and the answers are then organized into cause-and-effect relationships so that all factors that contributed to a problem are laid out in an intuitive format.  In this example, impacts to several goals are worth considering.

The first issue is that nobody knows exactly how the environment will respond to this much iron being put into the ocean.  The environmental impacts may well turn out to be minimal, but this is by far the largest experiment of this type done to date so nobody really knows how big the impact will be.  The experiment is also particularly worrisome because there wasn’t adequate scientific oversight or approvals for it.  The man conducting the experiment was an entrepreneur hoping to make money.  A local tribe hired the entrepreneur to fertilize the ocean with iron in a bid to increase the local salmon population by increasing their food supply.  Adding iron to the ocean can create a rapid increase in the phytoplankton population, which are the base of the aquatic food chain, because iron is often the limiting nutrient for phytoplankton growth.  Iron is necessary for photosynthesis and thus phytoplankton growth.  But iron is also highly insoluble in sea water so large areas of the ocean have limited iron supplies.

The entrepreneur also hoped to find a way to cash in on carbon offset credits because phytoplankton blooms may be a way to sequester carbon and improve greenhouse gas numbers in the environment.  This may work because phytoplankton absorb carbon dioxide during their life and when they die they sink into the ocean, taking the carbon dioxide with them and removing it from the atmosphere.

The second issue is that there are known risks associated with large blooms of phytoplankton.  They can negatively affect the other aquatic life in the region because large blooms can deplete the ocean of oxygen.  This occurs because the populations of other microorganisms will increase since the increase in phytoplankton provides a larger food supply.  Some of these other microorganisms absorb oxygen so more of them means less oxygen for other aquatic life. Phytoplankton live near the surface, but they sink as they die so a bloom will impact the food supply and oxygen levels throughout the entire depth of the ocean.

A final goal worth considering is the impact this has on public opinion.  Iron fertilization is a contentious issue to begin with because many people are opposed to purposefully manipulating the environment.  When somebody dumps tons of iron into the ocean without solid scientific involvement it understandablely outrages the public. The negative press will make it harder for any legitimate scientific research being done in this field.

This issue has been covered by The New York Times , The New Yorker  and NBC news. Click on any of these links to learn more about this issue.  Click on “Download PDF” above to view a high level Cause Map.

 

Toyota Recalls Millions of Vehicles Because of Fire Risk

By Kim Smiley

On October 11, 2012, Toyota announced a recall of 7.4 million vehicles worldwide due to a potential fire hazard.  This newest recall comes on the heels of the heavily publicized unintended acceleration issue and puts Toyota once again in an unwanted spotlight.

A Cause Map, a visual format for performing a root cause analysis, can be built to help analyze this issue.  The first step in building a Cause Map is to create an Outline that lays out how the issue impacts the overall goals of an organization.  In this example, the safety goal is impacted because of the potential for injuries and car accidents.  The production goal is impacted because of the effort needed to recall millions of vehicles.  The customer service goal is also impacted because of the negative publicity that a recall of this size will generate.  After the impact to the goals is determined, “why” questions are asked to determine what causes contributed to the issue and to create the Cause Map.

Starting with the production goal, we would ask “why” millions of vehicles were being recalled.  This is happening because there is a component that may need to be repaired, the component is in many vehicles and there is a potential for injuries if the component isn’t repaired.  A component needs to be repaired because the power-window switches pose a fire risk.  Some of the power-window switches feel sticky when operated and if some commonly available lubricants are applied it will create a fire hazard because the switch can melt.  There are millions of these power-window switches to repair because they were used across multiple models for several years because using standard parts is usually cheaper.  There is a potential for injuries because a fire starting in the power-window switch while the car is driving would be pretty distracting.

This recall will generate negative publicity because it is a huge recall, the a largest vehicle recall since Ford Motor Co recalled 7.9 million vehicles in 1996, and the timing is a bit unfortunate since it comes shortly after the unintended acceleration issues that resulted in large recalls.  In fact, some of the vehicles being recalled this round are the same vehicles that have had previous recalls, a fact that probably isn’t reassuring to owners.

The good news is that the fix for this problem is relatively simple beyond the innate hassle of taking a vehicle to the dealer.  The recall consists of a technician inspecting, disassembling and applying approved fluorine grease to the power-window switch, improving the sticky operation and decreasing the likelihood that some handy soul might apply an unapproved lubricant and inadvertently melt the part.

To view a high level Cause Map of this issue, click on “Download PDF” above.

Supply of Disposable Diapers Threatened by Explosion at Chemical Plant

By Kim Smiley

On September 29, 2012, an explosion at a chemical plant in Japan killed a fire fighter, injured 35 others and did significantly damage.  Chemicals produced at the plant are used in disposable diapers.  The damaged plant will be inoperable for the foreseeable future, which will likely impact the global supply of disposable diapers, a thought that strikes fear in the hearts of many parents of small children.

This incident can be analyzed by building a Cause Map, an intuitive, visual format for performing a root cause analysis.  The first step in building a Cause Map is to identify which goals were affected.  In this case, the safety goal is obviously impacted since there was a fatality and injuries.  The production goal is also a major consideration since the supply of disposable diapers is threatened because the plant will be unable to produce chemicals for a significant amount of time.  The next step is to ask “why” questions to add additional boxes to the Cause Map.

Starting with the safety goal first, we would ask “why” there was a fatality and injuries.  In this example, people were hurt because there was a fire at a chemical plant.  The fire occurred because a tank exploded and it was near other tanks full of flammable chemicals.  The tank exploded because the temperature inside the tank was increasing and it wasn’t cooled in time.  It isn’t clear yet why the temperature was increasing inside the tank, but investigators are working to find the cause.  Once it is known, it can be added to the Cause Map.

At the time of the explosion, efforts were underway to cool off the tank, but they weren’t effective.  Firefighters were working to spray down the tank with cool water to help lower the temperature, but the temperature rose too quickly.  This is also a cause of the fatality.  A fireman was working to connect spray lines near the tank at the time it exploded and he was sprayed with hot chemicals.  Other injuries occurred at the time of explosion and others were sustained during the effort to fight the fire.  It’s possible that one of the reasons that the workers were unable to cool the tank was that the usual method of cooling the tank, injecting nitrogen to decrease the oxygen and control the chemical reactions occurring, might not have been functioning properly.  This is another area that can be clarified on the Cause Map as more information is known.

Looking at the production goal now, a potential shortage of disposable diapers may occur as a result of this accident because the plant produced a significant amount of a chemical used in manufacturing diapers.  This plant produced 20% of world’s supply of one chemical in particular needed for diapers.  Combine this with the fact that the other plants manufacturing this chemical are already operating at maximum capacity and the supply will likely be less than the demand.

The final step in the process is to use the Cause Map to develop solutions to help prevent similar problems from occurring in the future.  It’s premature to discuss specific solutions in this example since the investigation is still ongoing, but the initial Cause Map can easily be expanded and used when all the information is available.

How a Toothbrush Helped Save the Space Station

By Kim Smiley

Using ingenuity reminiscent of Apollo 13, the crew on the International Space Station (ISS) recently found a way to fix an ailing electrical system using handmade tools made with an allen wrench, a wire brush, a bolt and a toothbrush.

The events that led to this dramatic repair attempt can be built into a Cause Map, a visual root cause analysis to help illustrate the causes that contributed to the problem. In this example, the problem was an issue with the electrical system on the space station.  Electrical issues can obviously quickly become dangerous on a space station because the life support systems need electricity to function. The impacts to the schedule and potential issues with accomplishing all the mission goals are also worth considering.

In order to fix the problem, astronauts needed to replace a failed Main Bus Switching Unit, a component that is responsible for collecting and distributing power from the solar arrays.  The ISS has four Main Bus Switching Units and each serves two of the eight solar arrays so the loss of a one of the units significantly impacts power supply.

The units are located outside of the space station and the plan was to replace the malfunctioning unit during a spacewalk, but the two astronauts doing the work ran into a problem.  An accumulation of metal shavings caused a bolt to stick, preventing installation of the new unit.  The astronauts needed to find a way to remove the metal shavings, but none of the tools they had taken on the spacewalk could get the job done.

The nearest hardware store was over 200 miles of atmosphere away and the options were limited, but the crew found an elegantly simple solution to the problem.  They created a cleaning tool out of items onboard the space station, including a $3 toothbrush.  An extra space walk was planned, the metal shavings were cleared and the new Main Bus Switching Unit was successful installed.  A cheap toothbrush taped to a metal handle had helped fix a $100 billion space station.

And if you’re wondering which Astronaut drew the short end of the oral hygiene stick, don’t worry the tooth brush was a spare.

To view a high level Cause Map of this issue, click on “Download PDF” above.

Knife Cuts in Restaurants

By ThinkReliability Staff

Knife cuts in restaurants pose a big risk, not only to the restaurant employees themselves, but also to customers due to the potential risk of contamination by blood or bandages as a result of an employee who receives a laceration due to a knife cut.  There are steps that can be taken to reduce the risk of a knife cut.  While some of these steps can be taken by restaurant employees themselves, many will involve the restaurant management as well.  Although these recommendations are based on knife cuts that occur within the restaurant and food preparation industry, they are also relevant for use at home to protect against lacerations from knives.

You can view some different causes that can result in lacerations from knives in a Cause Map, or visual root cause analysis, by clicking “Download PDF” above.   With any root cause analysis, the goal is to determine as many solutions as possible to reduce the risk of the issue – in this case, knife cuts – from happening in the future.  When we put together a proactive investigation – not based on one specific incident, but rather combining any possible causes we can brainstorm to best determine solutions – we can use some examples of actual lacerations that have occurred, and also our personal experiences to brainstorm causes.  As with any investigation, the wider net we cast, the more ideas we brainstorm and the more possible solutions we can discover.

The setup of the food prep area is key to reducing cuts.  Inadequate lighting and distraction can lead to increased injury, as can the storage location of the knives.  (You’re much more likely to cut yourself grabbing a knife out of a drawer than off a magnetic strip or out of a block.)  The condition of the knives themselves is also key.  Properly maintained knives – that is, knives that are sharpened and the handles are properly attached – are less likely to cause cuts because dull knives, or those with loose handles, make it difficult to cut properly, increasing the risk of cuts.  Knives should be regularly sharpened and if a knife is damaged, it should be disposed of.  In addition, having the proper compliment of knives is important.  Proper cutting technique can reduce knife cuts, but a key component  to proper cutting technique is having the correct knife.

An additional component of proper cutting technique is training.  Training should include techniques for cutting as well as which knife to use for which type of cutting and what kind of food product.  Some of the key aspects to knife cutting technique that can decrease the incidence of knife cuts include: cut away from you, using a cutting board with a mat to keep it from slipping.  Hold objects with your fingers pointing straight down, using your knuckles as a guide for the knife.  It’s very difficult to cut yourself while holding a knife this way.

Not all knife cuts occur while cutting food.  One frequent source of knife cuts is reaching into a sink full of soap water and grabbing a knife blade.  When hand washing knives, put it one knife at a time and don’t let go of it.  Always set knives well onto the counter with the blade facing away from you.  And if a knife falls off a prep surface, step back and let it fall.  If you are particularly concerned about knife cuts, you may want to consider the use of Kevlar gloves.  Restaurants that use Kevlar gloves have seen a remarkable decrease in injuries due to knife cuts.

To view the Cause Map, please click “Download PDF” above

SL-1 Explosion-The Only Fatal Reactor Accident in the US

By ThinkReliability Staff

The only fatal reactor accident in the United States occurred on January 3, 1961, when an Army prototype known as SL-1 (for stationary, low power reactor, unit 1) exploded, killing the 3 operators who were present.  We’ll use the SL-1 tragedy as an example of how the Cause Mapping process can be applied to a specific incident.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.

The SL-1 tragedy killed the three operators present, which is an impact to the safety goal.  Another goal is that there be no damage to the vessel. In the case of SL-1, the  vessel sustained extensive damage.

The loss of life and vessel damage were both caused by the reactor exploding.  The reactor exploded because it went prompt critical (an uncontrollable, exponentially increasing fission reaction).  The reactor went prompt critical because withdrawal of the central rod can cause prompt criticality and because the rod was rapidly, manually lifted 26.4″ out of the core.

Withdrawal of the central rod can cause prompt criticality due to a lack of shutdown margin in the core, and inadequate safety criteria.

Because most of the evidence was so effectively destroyed, nobody really knows why the control rod was lifted out of the core.  There are two theories (disregarding the bizarre and improbable murder/suicide theory): 1) the control rod got stuck while being lifted to be attached to the drive mechanism, and, as the operator was exerting greater force on it, suddenly came free, resulting in a lift far greater than intended, or that an rod drop testing/exercising was performed improperly.

The control rod may have become stuck and came free while being attached because it was required to be lifted 4″ out of the core and because control rods had been sticking.  The control rods had been sticking for one or more of the following reasons: 1) reduced clearances due to radiation damage (which can cause structural material to swell), 2) the passage was blocked due to loss of poison strips in the channel, caused by poor design and inadequate testing, or 3) lifting equipment not working properly due to inadequate lifting capacity of the lifting equipment.

It’s also possible that an exercising/testing was potentially improperly performed.  This could have occurred because the operators chose to exercise/test the rods, attempting to ensure that they would perform properly, and because they didn’t realize what would happen. This is because of inadequate training and inadequate work instructions.  The testing was also potentially done improperly due to inadequate work instructions.

On a positive note, the SL-1 incident did initiate some positive changes in the nuclear industry.  Most notably, reactor design has improved and incorporated a “one-rod stuck” criteria which specifies that a reactor can NOT go critical by the removal of any one control rod.  Additionally, procedures and training have gotten more intense and more formal, and planning for emergencies has increased.

Navy Jet Crashes into Apartment Building

By Kim Smiley

On April 6, 2012, a Navy F-18 jet crashed into an apartment building in Virginia Beach, Virginia. Significant damage was done to the apartment building and the jet was destroyed, but amazingly no one was seriously injured or killed.

This incident can be analyzed by building a Cause Map, an intuitive, visual format for performing a root cause analysis.  The first step when building a Cause Map is to determine how the incident affected the organizational goals.  The impacts to the organizational goals are recorded in the Outline which also documents the background information of the incident.  In this example, the safety goal was obviously impacted since there was potential for serious injuries.  The property goal was also impacted because the jet was destroyed and the apartment building suffered extensive damage.

Once the Outline is complete, “why” questions are asked to determine what factors contributed to the incident.  In this example, there was potential for injuries because a jet hit an apartment building.  This occurred because the jet was flying near the residential area and the jet was unable to complete its attempted take off.  The pilots could have been injured had they not been able to safety eject before the crash and there was potential for people on the ground to be injured since the jet crashed into a residential area. The jet crashed because it experienced a dual engine failure.  The investigation into this crash determined that that both engines failed for two separate, unrelated reasons.

The right engine failed because of a catastrophic failure of the engine compressor when it ingested flammable liquid that was ignited.  The left engine afterburner failed to light. Investigators believe that an electrical component failed, but the damage to the left engine was too severe for a conclusive determination of what exactly occurred.   According to the Navy, this is the first unrelated dual engine failure of a F-18.

The Navy plans to update procedures to incorporate the possibilities of this type of incident.

To view a high level Cause Map of this issue, click on “Download PDF” above.

Loss of Firefighting Plane Affects Firefighting Efforts

By ThinkReliability Staff

Wildfires in the Rocky Mountain region have been plaguing the nation for weeks.  The firefighting mission took a severe hit when a C-130 that was dropping flame retardant on the fire crashed on the evening of July 1, 2012, killing four of six crewmembers and injuring the other two.  As a result of the crash, the Air Force grounded other C-130s for two days, increasing the work for firefighters on the ground.

Although the Air Force has not released details of what exactly resulted in the plane crash, we can look at the information we do have available in a visual root cause analysis or Cause Map.  We begin by determining which of the organization’s goals were impacted in the Outline.  First, because of the deaths of the crewmembers, the safety goal was impacted.  The environmental and customer service goals were impacted because of the decreased ability to fight wildfires.  The schedule goal was impacted because other C-130s were grounded for two days.  The property goal was impacted because of the damage to the plane, and the labor goal was impacted due to the increased difficulty for remaining firefighters in fighting the fire.

Once we have determined these impacts to the goals, we can begin asking “Why” questions to draw out the cause-and-effect relationships that led to the impacted goals.  The safety, and other goals, were impacted due to the plane crash.  Again, although the Air Force has not released details of its ongoing investigation, it is believed that  downdraft (caused by the same high winds in the area that are helping to keep the wildfires travel) may have contributed to the crash.  An additional contributor is the fact that the plane was likely traveling at extremely low altitude, which allowed the plane to perform its task to help fight wildfires.  Lastly, it is possible that the heavy demands placed on the plane due to the extent of the fires may have contributed to the incident.  If, during the course of the investigation, it is determined that one of these causes was not related to the plane crash, the causes can be crossed out, but left on the map.  Evidence that shows that this cause did not result in the incident should be placed under the box.  This allows us to keep a complete record of which causes were considered.

Once the causes related to the incident have been placed on the map, solutions to mitigate the risk of this type of incident from happening again can be brainstormed and implemented.

To view the Outline and Cause Map, please click “Download PDF” above