Category Archives: Root Cause Analysis – Incident Investigation

The Deadliest Airship Crash in History Wasn’t the Hindenburg

By Kim Smiley

Many people have heard of the Hindenburg, but have you heard of the USS Akron?  The Hindenburg crashed in 1937, killing 35 people. The USS Akron crash four years earlier killed 73, making it the deadliest airship crash in history.

The crash of the USS Akron can be investigated by building a Cause Map, a visual format for performing a root cause analysis.  A Cause Map is built by asking “why” questions to determine what causes contributed to an issue.  The causes are organized on the Cause Map to illustrate the cause-and-effect relationships between them.  Why were 73 people killed?  This occurred because they were onboard the USS Akron, the airship struck the ocean surface, the crew had little time to brace for impact and there were insufficient flotation devices onboard.

The crew was onboard the USS Akron because the airship was operated by the US Navy and was performing a routine mission at the time of the crash.  The airship hit the ocean because it was operating over the ocean and lost altitude in a severe storm.  Why was the airship operating in a storm?  There was no severe weather predicted at the time and a low pressure system unexpectedly developed.  The crew had little time to brace for the impact because they weren’t aware that an impact was imminent.  There was low visibility at the time because it was a stormy, dark night. The barometric altimeter was also showing that the airship was higher than it actually was.  Barometric altimeters are affected by pressure and the low pressure in the storm impacted more than the crew realized.   The lack of life jackets and other floatation devices also contributed to the high number of deaths.  There were no life jackets onboard the airship at the time of the crash and only one rubber raft.  The safety equipment had been given to another airship and had never been replaced.

While few of us plan to operate or build an airship anytime in the near future, the important of keeping sufficient safety gear onboard any vehicle of any kind is an important lesson.  Lack of safety gear is a reoccurring theme in many historical disasters.  For example, the sinking of the Titanic would be a very different story if there had been sufficient lifeboats onboard.  This example might be very different if the crew had been wearing life jackets.  The airship would still have been lost, but there would likely have been fewer casualties.

To view a high level Cause Map of this example, click on “Download PDF” above.

Collapse of Salt Mine Creates Massive Sinkhole in LA

By Kim Smiley

On August 3, 2012, a massive sinkhole appeared in Assumption Parish, Louisiana that continues to grow and evade easy answers.   About 150 homes were evacuated and residents are still displaced more than seven months later.

What caused a sinkhole to form overnight?

That question can be answered by building a Cause Map, a visual root cause analysis.  In the Cause Mapping process, the first step is to fill in an Outline with the background information for an issue as well as how the problem impacts the goals.  In this example, the sinkhole impacts several goals including the environmental goal, the safety goal because there is a potential for injuries, the financial goal because of the costs associated with the emergency response and remediation of the issue, and the customer service goal because 150 homes have been evacuated for an extended time.  Once the Outline is completed, the next step is to ask “why” questions to find the different causes that contributed to the problem being analyzed.

So why did the sinkhole form?  The sinkhole formed when an underground salt mine collapsed.  This happened because there was a salt mine in the area and a wall of the mine failed.  Salt was mined in the area because there was a large deposit of salt underground and salt mining is profitable since it is used in a wide range of industries.  The wall collapsed because it was too thin to support the pressure because the mine was inadvertently located too close to the edge of the salt deposit.  The mine ended up too near the edge because the location of the salt deposit wasn’t accurately known.  It’s difficult to access salt deposits thousands of feet underground and the mine was permitted in 1982, using 1960s maps of the salt deposit, using technology that was limited compared to what is used today.

There is also a potential for injuries associated with the sinkhole both because it continues to grow and because there is a risk of explosion from the natural gas being released.  The sinkhole has given the underground pockets of natural gas a pathway to the surface. Workers are trying to minimize the danger by flaring the gas off and ensuring there isn’t anywhere it can build up.

The financial impacts of this issue are substantial, both to the community and the mining company.  The salt mining company is in negotiations to buy out the displaced residents and has been providing financial support to them during the evacuation.  The costs of the emergency response are also adding up, not to mention the cost of whatever remediation is necessary once the area becomes stable and the full extent of the issue is known.

The final step of the Cause Mapping process is to develop solutions to prevent the problem from reoccurring.  This is still an ongoing issue, but some steps have already been taken to help prevent future sinkholes from forming.  Advances in technology have already improved understanding of underground deposits and will help in locating future mines.  Another possible solution is regulation changes to require mines to be located farther from the edge deposit.

To view an Outline and a high level Cause Map for this issue, click on “Download PDF” above.

8 Marines Killed During Training Exercise With Live Ammunition

By ThinkReliability Staff

Eight Marines were killed, and seven Marines and sailors were injured, as the result of the unexpected explosion of a 60 mm round inside a mortar tube during a live ammunition training exercise.  While details are still to be determined, it is known that the unexpected explosion of a mortar round led to the deaths and injuries of those participating in a training exercise with a 60 mm round inside a mortar tube.

Though details of the incident are still unknown, we can begin a Cause Map, or visual root cause analysis, diagramming possible causes which remain to be investigated.  As more information becomes available, evidence supporting or excluding potential causes is included on the Cause Map.

We capture the What, When and Where of the incident in the Problem Outline.  In this case, a training accident/ explosion occurred on March 18, 2013 at about 10:00 pm at the Hawthorne Army Depot in western Nevada.  At the time, a mountain training exercise with live ammunition was using a 60 mm round inside a mortar tube.  A traffic accident that may be related has been mentioned in the news, but no detail has been provided.  To ensure that this line of inquiry is followed during the investigation, we can include it in the “different, unusual, unique” line of the problem outline.

Data that is known, such as the types of damage resulting in deaths and injuries, is included with supporting evidence, in this case testimony of the hospital spokesman.  Causes still to be determined, such as whether the mortar round exploded prematurely in the tube, detonated after being fired, or whether more than a single round exploded are included with question marks and joined by “OR”.  When evidence is obtained throughout the investigation related to a given cause, it is included directly beneath the cause it controls.  Along with the unknown method of detonation of the round, it is unknown whether an issue with the firing procedure, a malfunctioning firing device, or a malfunction in the explosive mortar is to blame.

More details should be coming soon since the use of 60 mm mortars is suspended until the review of this incident determines what happened.  At that time, those causes ruled out by evidence can be crossed off (but left on the Cause Map so that others know they were considered and ruled out as more evidence became available).

At that time, solutions that best address the issues that were causally related can be brainstormed, evaluated, and implemented.

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more.

Hindenburg Crash: The Importance – and Difficulty – of Validating Evidence

By ThinkReliability Staff

Since the Hindenburg explosion in 1937, theories have abounded on what caused the leaking gas and spark that doomed the airship and dozens of passengers.  We discussed some of these theories in our previous blog on the Hindenburg disaster.

In December, 2012, a documentary on the Discovery Channel used new evidence to discuss the most likely cause of the disaster.  Yep, that’s right.  76 years after the original explosion, evidence is still being gathered to help determine what really caused the explosion that killed 36 people.

Sometimes evidence is relatively easy to gather – many pieces of equipment now feed into automatic data collectors, which can provide reams of data about what happened for a specific period of time.  Sometimes, however, evidence is much harder to come by. This is especially the case with fires or explosions which frequently destroy much of the available evidence.

When evidence is hard to come by, it is difficult to determine the exact cause-and-effect relationships that led to an incident.  The best we may be able to do is capture different possibilities in a Cause Map, or visual root cause analysis, and leave the causes that haven’t been validated by evidence as possible causes, indicated by a question mark.

Sometimes, determining the exact cause(s) is important enough to result in painstaking efforts like those performed by a team at the South West Research Institute.  The team created three 1/10-scale models, not a small undertaking when the scale models are over 80 feet in length and is inflated with 200 cubic meters of hydrogen.  They then replicated scenarios described by the various theories by setting fire to, and blowing up, the models.  Additionally, they studied archive footage and eyewitness accounts to increase their understanding of the disaster.

As a result, the team now believes they have determined what happened.  Says Jem Stansfield, an aeronautical engineer and the project lead, “I think the most likely mechanism for providing the spark is electrostatic.”   The spark ignited leaking hydrogen, caused by a broken tensioning wire that punctured a gas cell or a sticking gas valve.

View the updated investigation with the recently released evidence incorporated by clicking “Download PDF” above.

Read our detailed writeup on the Hindenburg investigation.

Or, click here to read more from the blog of the on-air historian and technical advisor to the project (some really cool photos of making and destroying the models are included).

Plan to Control Invasive Snakes with Drop of Dead Mice

By Kim Smiley

Brown tree snakes are an invasive species that was inadvertently introduced to Guam where they have decimated native bird populations and done massive environmental damage.  It’s estimated that there are about two million of these snakes  on the island.  The newest plan of attack in the battle to control the brown tree snake population is to poison the snakes by parachuting dead mice laced with pain killers onto Guam.

The problem of invasive brown tree snakes can be analyzed by building a Cause Map, a visual root cause analysis.  A Cause Map is built by asking “why” questions and adding the causes to intuitively show the cause-and-effect-relationships.  The first step is to identify the goals that are impacted.  In this example, the environmental goal is impacted because the balance of native species on Guam has been altered.  This has happened because the native bird population has been decimated because they have been eaten by an invasive predator, the brown tree snake.  The spider population has also exploded because many of the birds, their main predator, have disappeared.  The snakes also cause significant and expensive power outages on Guam as they climb into electrical equipment.

Brown tree snakes have taken over Guam for several reasons.  First, the snake was accidently introduced to the island, likely as a stowaway in military cargo after World War II.  Once the snake was on the island, it thrived because the species had no major predator on the island, there was little competition for resources, and there was an abundant food source.  There was little competition because Guam had only one other snake species prior to the introduction of the brown tree snake.  The native snake species is blind and significantly smaller, preying mostly on insects.  The brown tree snake had ample food because it is a pretty flexible predator happy to eat birds, lizards, bats and small mammals.  In fact, the brown tree snake has found Guam so hospitable that the snakes grow larger on Guam than in their native habitat where predators are more plentiful and food is more limited.

Presence of these snakes on Guam has caused massive damage.  Nine of twelve native bird species are extinct on the island.  The snakes have also eaten a significantly amount of the small mammal population.  There has also been a huge impact on vegetation on Guam since the snakes have wiped out many of the pollinators.  Scientists have been trying to find ways to improve the situation.

The newest plan involves dropping dead mice laced with pain killers onto Guam.  The pain killers are deadly to the snakes if ingested.  The mice will be attached to something called a flagger, which is two pieces of cardboard attached with a streamer.  The flagger should act like a parachute and catch in the tree canopy, which is where the snakes predominately spend their time.  The hope is that the snakes will then eat the pain killer laced mice, thus reducing their population.  The current plan is to drop about 2,000 mice over an enclosed area to determine if this is an effective method of brown tree snake population control.  If it works, more dead mice could be headed Guam’s way in the future.

To view a Cause Map of the brown tree snake problem and a Process Map of the plan to drop dead mice, click on “Download PDF” above.  To view a similar example about controlling feral cats on Macquarie Island, click here.

 

 

 

Natural Gas Explosion Kills One in Kansas City

By ThinkReliability Staff

A natural gas leak at a business plaza in Kansas City was reported to the Fire Department just prior to 5 pm on February 19, 2013.  However, the area was not evacuated until just prior to an explosion that left 1 dead and 15 injured.  The leaking gas was not shut off until 3 hours after the report.

The causes that resulted in this tragedy can be examined within a Cause Map, or visual root cause analysis.  The analysis begins by determining which goals were impacted in a problem outline, which captures the what, when and where of the incident, as well as the impact to the goals.  In this case, the safety goal was impacted due to the fatality and injuries.  The environmental goal was impacted due to the natural gas leak and the customer service goal was impacted due to an ineffective evacuation.  (How do we know it was ineffective?  Because people were still present in a building that exploded due to a gas leak that was known for almost an hour, although the timing of the ordered evacuation is not known.)  Additionally, the property goal was impacted due to the destruction of restaurant, which was the site of the explosion, and damage to adjoining properties.  Lastly, the labor goal is impacted due to the investigation by state utility regulators, which is expected to take months of painstaking work to add detail to the causes which are already known.

Once these goals have been determined, we begin with an impacted goal and ask “Why” questions to add detail to the analysis.  The safety goal was impacted due to the death and injuries.  These occurred because of the explosion AND because people were in the proximity of the explosion.  Had the explosion occurred after a complete evacuation, the injuries would have been substantially reduced, if not completely prevented, although the property goals would have still been impacted.

An evacuation ws not ordered by the fire department, who deferred to the utility company.  The utility company was slow in determining that an evacuation was needed.  There was general confusion about the responsibility for determining an evacuation.   Per the city’s emergency response plan, the Incident Commander is responsible for evacuations.  However, no Incident Commander  was named on-scene until after the explosion, as it was determined that no incident yet existed.  Because quite a bit of flexibility is generally needed in determining whether an evacuation is needed (as an evacuation itself can be dangerous), the emergency response plan is necessary somewhat confusing (in this case, contained in a 90-page document).

The explosion itself resulted from an unknown heat source within the restaurant igniting leaked natural gas.  The natural gas was leaking as a result of being struck by a boring machine being used to install fiber-optic cable in the area.  It was later determined that the contractor did not have the necessary permit for the work, though it’s not clear if that led to confusion on the location of the gas lines, or if they were mislabeled, or if it’s just that it’s really difficult to see lines when digging deep trenches using a boring machine.

The extent and probability of an explosion is related to the volume of gas released during a leak.  Had the gas been turned off earlier, the explosion might have been avoided, or lessened, reducing the impacts to all the goals.  The gas was not turned off before the explosion, and after the explosion continuing fires made the shut-off locations difficult to reach.  it’s not clear why the gas wasn’t turned off immediately, though the choice to do so  does result in other impacts, such as the loss of gas to other customers.  In cases where the true extent of the issue is not known, it is difficult to make these decisions and limit potential effects.

Because one of the issues was not knowing the extent of the leak, it has been suggested that all fire department trucks be equipped with natural gas sniffers.    Additionally, an update to the city’s evacuation protocol has been called for that would, among other changes, give authority to the first arriving public safety official  to order an evacuation, resulting some of the confusion that led to the tragedy in Kansas City.

As this example shows, it’s not only attempting to prevent these events that’s important but also ensuring that emergency plans and protocol clearly define actions to be taken as well as responsible parties.  Drills and simulations can ensure that the plans and protocols are even more effective.   This is true not only for cities and fire departments but for any organization tasked with the safety of people . . . which is to say, all of them.

To view the Outline, Timeline, Cause Map, and Solutions, please click “Download PDF” above.

Engine Room Fire Results in Cruise Ship Nightmare

By Kim Smiley

On February 10, 2013, an engine room fire on the Carnival Triumph cruise ship knocked out a significant portion of the ship’s electricity and crippled the propulsion system.  Passenger descriptions of the rest of their “vacation” have included the words hellish and nightmare.

This incident can be reviewed by building a Cause Map, a visual format for preparing a root cause analysis.  A Cause Map intuitively lays out the causes that contributed to an issue to visually show cause-and-effect relationships.  The first step in building a Cause Map is to fill in an Outline which includes the basic background information for an issue as well as the ways that the problem impacts the goals.  In this example, a number of goals are impacted such as the customer service because of the many unhappy passengers and negative media coverage; the schedule goal because the delay of the return of the ship; and the safety goal because of there was a potential for illness.    Once the impacts to the goals are determined, the Cause Map is built by asking “why” questions.

Starting with the safety goal, the first step would be to ask “why” there was a potential for illness.  Illness was a very real possibility because of the unsanitary conditions that existed onboard the ship.  The toilets in the aft portion of the ship couldn’t be flushed because the sewage system was inoperable after the fire.  Full toilets and the rolling motion of the ship made a disgusting and unhealthy combination.  There have been many reports of human waste on floors and even leaking between levels onboard the ship which is probably not anybody’s idea of an ideal vacation setting.  Add in the limited electricity available after the fire and passengers faced filthy cabins without lighting or air conditioning.  Food also became an issue because the limited electricity made preparation of hot meals difficult and the supplies diminishing as the ship remained at sea longer than planned.  The ship’s return was delayed because it had to be towed back to port after the fire wiped out its propulsion.

Investigators are working to determine what caused the fire that started this mess.  They have determined that a leak in a fuel oil return line was part of the problem, but it may be months before the details are known.

What is known is that cruise ship fires aren’t as rare as might be expected.  There were reports of 79 fires onboard cruise ships from 1990 to 2011.   While more information is needed to understand the details of this particular fire, there has been speculation that lack of adequate preventative maintenance may contribute to this issue across the cruise industry.  Keeping a cruise ship in port for a week’s worth of maintenance costs tens of millions of dollars and companies have to try to balance this cost with the risk of an issue during operation.  And the risk is big.  If something goes wrong during operation, like it did in this example,  it can be very expensive.   The total cost of the fire onboard Carnival Triumph is estimated to be $80 billion, including 12 cruises that have already been canceled to allow time for repairs.  In addition the negative press isn’t exactly helping entice potential customers into booking a cruise.  Balancing the cost of maintenance with the risk of not performing it is an issue that many industries face.  No one wants to spend money on unnecessary maintenance, but no company wants to make headlines that have the word nightmare in them either.

To view a high level Cause Map of this issue, click on “Download PDF” above.

Check out our previous blog about  the Costa Allegra , another cruise ship that lost power.

The Super Bowl Blacks Out in New Orleans

By Kim Smiley

The Super Bowl is always one of the most talked about television events of the year and this year the game was even more interesting than usual.  An impressive comeback attempt following a game delaying blackout made this one to remember.

The question of what caused the highly publicized blackout can be analyzed by building a Cause Map, an intuitive, visual format for performing a root cause analysis.  The first step in building a Cause Map is to fill in an Outline with the background information for the issue.  The goals that are impacted by an issue are listed on the bottom of the Outline.  In this example, the schedule goal is impacted because the Super Bowl was delayed; the material goal is impacted because a component called an electrical relay device needs to be replaced; and the customer service goal was impacted because the delay changed the momentum of the game significantly.    Individual fans may disagree, but the companies who have profits impacted by the Super Bowl probably consider the momentum shift a pleasant side effect of the blackout since the last 17 minutes of this game were the most watched.  Once the Outline is complete, the Cause Map is built by asking “why” questions.

Starting with the schedule goal, the next step would be to ask “why” the Super Bowl was delayed.    This happened because the game wasn’t able to be played because of a partial loss of power.  The electrical company has announced that a component called an electrical relay device failed, but the exact reason it failed hasn’t been determined.   Another cause that can be added to the Cause Map is that the backup power was insufficient to power the whole Stadium.  This cause is worth considering because a possible solution to this problem could be to add a more robust back up system to mitigate any future power issues.

The relay had been installed during major system upgrades that were performed during the previous two years to ensure that the stadium was ready for the demands of hosting the Super Bowl.  The relay was added to protect the Superdome electrical equipment if there was a cable failure between the incoming power lines (operated by the electric company) and the lines that run through the stadium.

This power problem is still being reviewed and it is still being determined if an independent review of the issue is necessary.  Once more facts are known, they can be easily incorporated into the Cause Map.  The final step in the Cause Mapping process would be to develop solutions that would help mitigate the issue and prevent future power failures.

See more power outage cause maps:

The Costa Allegra Loses Power

Power Outage Stretches from Arizona to California

Chile Power Outage 

Want us to cause map a specific power outage for you? Contact us at  info@thinkreliability.com and we’ll give you a “lights out” root cause analysis.

Brazilian Nightclub Fire Kills At Least 238 People

By ThinkReliability Staff

A pyrotechnics display meant for outdoor use turned deadly at a band concert in a nightclub in Brazil on January 27, 2013.  The pyrotechnics – which were set off by the band – lit the soundproofing on the ceiling and it spread – with little help from non-functioning fire extinguishers.  The large crowd had difficulty leaving the club, which had only one exit blocked by bouncers who thought patrons were trying to leave without paying.

This tragic incident can be examined using a Cause Map, or visual root cause analysis, which visually diagrams all the causes and impacts related to the nightclub fire.  We begin with the impacted goals.  The safety goal was impacted due to the at least 238 people who were killed and 100 who were injured.  The severe fire is an impact to the environment.  People were unable to exit, which can be considered an impact to the customer service goal.  The loss of the use of the nightclub is an impact to the production goal, and the damage is an impact to the property goal.  Additionally, members of the band and owners of the nightclub are being held, potentially to be charged with manslaughter.  This can be considered an impact to the employee goal.

We begin developing cause-and-effect relationships by asking “Why” questions.  People were killed because they were in the nightclub, unable to exit and there was a severe fire.  Questions have been raised about why the nightclub was even in operation, as its licenses were expired.  People were unable to exit because there was only one exit – completely insufficient for a facility of this size and no windows in the bathroom.  Bouncers were blocking the only exit because they believed patrons were trying to leave without paying – nobody had told them of the fire.  Difficulty seeing the exits due to smoke and lost power resulting from the fire complicated matters even more.

The fire began when the pyrotechnics (heat) lit the soundproofing on the ceiling (fuel).  The fire was unable to be put out due to difficulties reaching the ceiling and non-functioning fire extinguishers.  Specific solutions are being debated by lawmakers in Brazil, but it is hoped that this tragedy will draw attention to – and improve – some of the conditions that contributed to this tragedy.

To view the Outline and Cause Map, please click “Download PDF” above.  Click here to read about another building fire.

 

The Dreamliner’s Battery Nightmare

By Kim Smiley

On January 16, 2013, the Federal Aviation Administration issued an emergency directive grounding all Boeing 787 Dreamliners operated by United States carriers during the investigation into two recent battery fires.  This emergency grounding is an unusually extreme step, especially given that the Dreamliner is a new plane with only six operated by US carriers at this time.

This issue can be analyzed by building a Cause Map, an intuitive, visual format for performing a root cause analysis.  A Cause Map is built by determining how the issue affects the goals of an organization and then asking “why” questions to find the causes that contributed to the problem.  In this example, the schedule goal is impacted because the Dreamliners have been grounded.  Why?  The Dreamliners were grounded because there is a known fire risk because there were two battery fires onboard these airplanes nine days apart.  The fact that the Dreamliner is the first major airliner to extensively use lithium-ion batteries and that fires in these batteries are particularly dangerous also contribute to the problem.   Lithium-ion batteries were used because they are lighter than other batteries and lighter planes use less fuel.  Fires in lithium-ion batteries are dangerous because they are difficult to extinguish because oxygen is released as they burn, which feeds the fire.

Several other goals are also worth considering like the customer service goal which is impacted by the negative publicity generated by this issue and the safety goal because there is a potential for injuries.   The economic impact of this issue could also be very significant since each Dreamliner costs $200 million and there are 800 planes on order in addition to about 50 that were already in service that may need to be repaired.

The battery fires are still being investigated but the cause isn’t known yet.  It may be an issue with manufacturing or the design itself.  What is known is that the Dreamliner is a brand new design that incorporates many new elements such as mostly electrical flight systems, an airframe that uses composite materials and the use of the lithium-ion batteries themselves.  The design process was also different from previous Boeing designs with much of the work outsourced to a network of global suppliers and very tight deadlines.

As more information becomes available, the Cause Map can easily be expanded to incorporate it.  To view a high level Cause Map of this issue, click on “Download PDF” above.