Tag Archives: root cause analysis

Chinatown Fire NYC

By ThinkReliability Staff

On April 11, 2010, a fire broke out in a store on the first level of an apartment building on the 200 block of Grand Street in Chinatown, New York City. The fire would eventually reach 7 alarms, requiring 250 firefighters to fight. Once firefighters were able to enter the building the next day, they found one body.  33 people, including 29 firefighters, were injured and approximately 200 were left homeless, as the blaze left three buildings needing to be demolished and at least two more severely damaged.

For years the buildings affected (which were more than a century old) had been neglected, including violations for missing smoke detectors and a boiler which released smoke into the buildings.  At this point it’s unclear how these violations may have contributed to the fire and its aftermath.  At the time of the fire, the buildings were for sale for over $9 million, although no offers had been made.    There were many goals impacted by the fire, but the loss of human life and number of injuries are the focus for our investigation.

The injuries (many of which were smoke inhalation) were caused by a seven-alarm fire.  The fire was able to reach seven alarms because the fire was able to quickly spread through the six-story building.  In order for a fire to start heat, fuel and oxygen are required.  There’s no shortage of fuel and oxygen in an apartment building, due to necessities for people to live there.  The heat (or ignition source) may have been provided by exposed wiring that many residents have complained of, or the boiler previously cited for neglect.  Or, it may have been something else altogether.  (However, arson is not suspected at this point.)

The fire was able to spread so quickly due to a large number of voids and shafts in the building – a function of its  age.  Another cause that may have contributed to the death was a potential lack of warning of the fire due to the missing smoke detectors for which the building had also been previously cited.

Throughout an investigation there may be additional tools that help to clarify the incident.  Here we use a timeline to show the sequence of events.  A timeline is especially useful for complex events such as this.

A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.  In fact, the outline, Cause Map and timeline for this event easily fit on one page.  (View them by clicking “Download PDF” above.)   Even more detail can be added to this Cause Map as more information is released about the incident. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.

Deadly Mine Explosion in West Virginia

By Kim Smiley

Around 3 pm on April 5, 2010 in Montcoal, West Virginia, a huge explosion rocked the Upper Big Branch South mine owned by Massey Energy Company.  At least 25 miners were killed, both from the explosion itself and suffocation caused by high levels of dangerous gases.

There are still 4 miners missing.  The missing miners were working farther back in the mine and the hope is that they were able to reach one of the airtight chambers stocked with enough food, water and oxygen for several days.  Rescue efforts are underway and drilling efforts are ongoing to add additional ventilation so that the gas levels can be reduced to safe levels to allow rescue workers to enter the mine.

This is the worst mine accident in the US in over 20 years. If the 4 missing miners are not found alive, this accident will have the highest number of fatalities since a 1970 mine killed 38 in Hyden, Kentucky.

What triggered this explosion is not known at this time, but both state and federal agencies have initiated investigations.

Even though many details are still unknown, a root cause analysis can be started by building an initial Cause Map.  There was an explosion which means there must have been an ignition source, flammable material and oxygen present.

The source of the flammable material is known since there were high methane gas levels in the mine.  Methane gas is naturally occurring in coal mines and must be continually vented.  It can also be assumed that the mine ventilation was inadequate for some reason since the gas levels built up.  Coal dust accumulation may have also contributed to the accident since powdered combustible material in an enclosed space is a very explosive combination.

The source of the spark that ignited the explosion is still unknown.

More information will become available as the investigation proceeds and a more detailed Cause Map can be built as additional causes are added.

Media reports about the accident have discussed past safety violations cited at the mine, but it won’t be clear if the accident was preventable until the investigation is completed.  What is known that in March 2010, the Mine Safety and Health Administration cited the Upper Big Branch mine for 53 safety violations.  In additional to the recent citations, there was also a troubling increasing trend in citations, which more than doubled between 2008 and 2009.

Hopefully, the information obtained during the investigation will provide useful lessons learned that can be implemented to prevent a similar accident in the future.

Power Outage Chile

By ThinkReliability Staff

A power outage struck Chile less than a month after an earthquake struck.  The power outage affected an area of nearly 2,000 kilometers and roughly 80% of Chile’s population.  Power in most areas was restored within several areas.  However, it was estimated that power to some in the Bio Bio region – which received more severe infrastructure damage – might be out for the better part of a week.

A power outage is an impact to the customer service and production/schedule goal.  The power outage was caused by the collapse of the Central Interconnected System (Sistema Interconectado Central).  The grid collapse was due to a lack of backup power capabilities, which was caused by a fragile power grid as a result of the earthquake, and interruption to the main power grid.  This interruption was caused by a disruption at the biggest substation due to a damaged transformer.  It’s unclear what caused the damage to the transformer, but it is believed to be related to the earthquake that hit in February.  We show this by adding a cause box with a question mark between “damaged transformer” and “earthquake on Feb. 27th”.

Repairs to the damaged transformer were required, which is an impact to the property and labor goals.

The Chilean government pledged to repair the transformer within 48 hours and stabilize the transmission lines within a week.  Interim solutions to get the electricity flowing were to isolate the damaged unit and install a reserve.  Additionally, Chileans have been asked to conserve electricity to minimize the amount of power transmitted through the lines.

By clicking ‘Download PDF” above, you can see the thorough root cause analysis built as a Cause Map that captures all of the currently known information in a simple, intuitive format that fits on one page.

Even more detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.

Salmonella Recall

By Kim Smiley

A number of food products have been recalled recently because of potential salmonella contamination.  The recall list is still growing and has the potential to affect thousands of items in nearly every aisle at the grocery store.

The contamination originated in hydrolyzed vegetable protein (HPV) which is a common, inexpensive salty and savory flavor enhancer used in a variety of products.  All HPV from Basic Food Flavors of Las Vegas made since September 17, 2009 has been recalled.   For a list of all recalled items and more information, please visit the Food and Drug Administration webpage.

The salmonella contamination occurred in the processing equipment at a one location, but HPV from that supplier was sold to food manufacturers nationwide.  HPV is a specialized product and there are only a few suppliers for it so issues at a single supplier have the potential to affect a significant percentage of the processed food supply.

The contamination was identified when a consumer of the Basic Food Flavors identified salmonella in a batch of HPV they had purchased and reported it to the FDA, utilizing the new FDA Reportable Food Registry.  The FDA then inspected Basic Food Flavors and found salmonella in the plant’s processing equipment.

The overall risk to the public is considered low.  No cases of illness from this contamination have been reported.  As long as products are heated to a sufficient temperature, either during the manufacturing process or cooked after purchase, the salmonella risk will be eliminated.  The highest risk products are ready to eat products such as chips, dips, and dip powder.

The investigation of this incident is still ongoing, but a basic root cause analysis can be started.  The safety goal is obviously impacted since salmonella can potentially cause illness and even death in the case of weakened immune systems.  In this case, the customer service goal would be impacted as well because the recall may affect customer confidence and sales of the recalled items.

Click on the “Download PDF” button to view an initial Cause Map of the salmonella contamination.  The Cause Map can be expanded as more details are available.

Water Pollution from Sewer

By ThinkReliability Staff

Thanks in part to the Clean Water Act, passed in 1972 and revised in 2000, most residents of the United States have continual access to clean, safe water.  However, extenuating circumstances may result in pathogens remaining in drinking water – or contaminating swimming water – resulting in potential illnesses.  In fact, researchers estimate that up to 20 million people per year become ill due to ingesting pathogens in water.  In addition to the environmental impact of untreated sewage reaching waterways, up to 400,000 basements and thousands of roads have been flooded with untreated sewage.

These floods generally occur when the sewer systems are overwhelmed or clogged.  A clogged sewage system can result from buildup of leaves, or other debris, including that from illegal dumping.

An overwhelmed sewer system is generally the result of a high volume of water passing through the system.  As the population increases, the strain on the system increases as well.  Since most municipalities do not have the funds available to upgrade or replace their systems, an aging, inappropriately sized system is all that remains to provide needed water.  However, systems are generally able to keep up with demand, except during times of high rainfall.  Many sewer systems handle both waste and rainwater through the same system.  When a heavy rainfall occurs, the system is overwhelmed, resulting in overflow of sewage.  This overflow is often directed into the waterways.  Dumping untreated or partially treated sewage into waterways is illegal, but fines are hardly ever levied.  The Federal Government may be unwilling to levy fines against municipalities for illegal dumping, especially because Federal funding to maintain sewer systems has decreased significantly.  With municipal budgets stretched already, dealing with aging sewer systems just isn’t happening.

However, there are some things that municipalities can do.  Green spaces (as opposed to paved areas) absorb rainfall, decreasing the amount directed in to the sewer system.  By planning more green space, or better drainage, the amount of rainfall that actually enters the system can be reduced.  Additionally, municipalities could redirect rainfall to keep it out of the waste portion of the sewer system. The cost of doing this may make it infeasible; however, calls for Federal stimulus money for repairs to sewer systems may result in municipalities’ ability to finally upgrade their systems.

Death of Luger at 2010 Winter Olympics

By Kim Smiley

On February 12, 2010, Nodar Kumaritashvili, an Olympic luger from the country of Georgia, was killed during a practice run.  He lost control of his sled, flew off the track and hit a steel pole.

The investigation into the accident is still ongoing, but a root cause analysis can be started with the information that is available.  This accident obviously impacts the safety goal because an athletic was killed and it also had potential to impact the schedule goal because the track was closed during the initial investigation.

There are a number of causes that can be added to the Cause Map.  One of the more obvious causes for the accident is that the athletic was traveling at high speeds.  This occurred because the crash happened near the bottom of the track so the sled was near its top speed.  Additionally, the Vancouver Olympic track is also a particularly fast track.  Top speeds on the track were predicted to be 96 mph, nearly 6 miles faster than the standing 2000 world speed record.

How did the track get designed to be so much faster than typical tracks?  There are a number of causes that contributed to fast design.  The designers choose Whistler as the site of the track because Whistler has a colder climate than the alternatives, resulting in firm, fast ice and because there is high tourist traffic there that would help make the track a commercial success after the Olympics.  Whistler was also the site of the Olympic alpine events.

The land that was available at Whistler was long and narrow.  The site was a valley approximately 100 yards by 800 yards.  By comparison, the Calgary track was about 300 yards wide and Salt Lake City’s track was 500 yards.  Designing a track to fit in the available region meant the track couldn’t include any long curves that slow down speed as is typical.

The result was the fastest track in the history of the sport.

As the investigation continues, more details become available and they can be added to the Cause Map.

In order to ensure safety during the Olympic Games, several solutions were implemented following the accident. A wooden wall was added to the curve where the accident occurred to keep athletics on the track, the steel poles were padded and events were started lower on the track to limit the maximum speed.  The lower start was predicted to slow top speeds in the men’s events by about 5 mph.

There have been several crashes on the course since the accident, but thankfully no farther significant injuries have occurred.

Metro Train Derailment Washington D.C.

By ThinkReliability Staff

On February 12, 2010 at approximately 10:13 A.M., a six-car Red Line Metro train taking passengers to Shady Grove derailed near the Farragut North station in Washington, D.C.  If you’ve been reading our blog, you’ve seen our reports on three previous Metro incidents in the past year (two Metro workers were killed in January, two trains collided last November, and two trains also collided last June).

Thankfully, this derailment caused only minor injuries.  However, it did result in an extremely messy commute for a lot of people, due to a severe delay in train service.  Additionally, there was likely damage to the train and/or track, which will require labor to repair.  More labor will be required for the investigation.

All the basic information, as well as the impacts to the goals (the injuries, delay in service, property damage and labor required as a result of the incident), relating to this event are captured in a problem outline.  We can also capture anything that was different at the time.  Here we note that there were major storms in the area and that the commute was especially heavy.

Once we have completed the outline, we can begin the Cause Map with the goals that were impacted.  The impacts to the goals resulted from the train derailing.  The train derailed when the front wheels slipped and the lead car came off the track.  Metro and National Transportation Safety Board (NTSB) investigators are determining the causes of the derailment, but some of the things that will be looked at as causes include: the train was moving onto a pocket track.  Other trains previously have slipped off the track while moving onto a pocket track (a side track that allows trains to pass other trains or move around construction).  It’s unclear whether the train was moving onto the pocket track to move around other trains or track work.

As previously mentioned, the snow and icy conditions (which have been extreme as of late in D.C.) may have caused the tracks to be slippery, which potentially contributed to the derailment.  It’s possible there was damage to the tracks or switch, as the area where the derailment took place is the oldest portion of the Red Line, and is due for maintenance.  Because of an extreme budget shortfall on the Metro line, repairs to tracks and cars have been delayed.  Last but not least, there’s a possibility that the weight of the rail car may have been a factor in the derailment.  The cars were extremely crowded because of an insufficient number for the commute.  Metro was not running the normal number of cars because it had not completely recovered from the storm, but there were the normal number of commuters because the Federal Government was open.  (The Federal Government usually remains closed when the Metro system is unable to run at full capacity.)

Even though we are not yet certain which factors may have contributed to the derailment, we can include them all on the Cause Map until we are able to rule some of them out.  Even more detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.  View the beginning stage of the root cause analysis investigation by clicking on “Download PDF” above.

Possible Toyota Prius Recall

By Kim Smiley

A new potential safety issue has developed and Toyota may recall the newest model of the gas electric hybrid Prius that has been sold since last May.  The National Highway Traffic Safety Administration has received 124 reports from consumers claiming that the brakes don’t engage immediately at times.  Toyota has stated that the company has received 180 reports of braking problems in Japan and the United States. The reports include 4 incidents that resulted in accidents with 2 people receiving minor injuries.

Even a slight delay in the response of car braking systems can be very dangerous because cars can travel nearly 100 feet in one second at highway speeds.

No official details are known yet on what is causing the delay in brake engagement.  In one article, a power train expert speculated that it was a software glitch caused when the hybrid switched between using the electric motor and the internal combustion engine.  In the Prius design, the same motor that is powering the car, powers the brakes.  When the hybrid is switching between motors, there might be a momentarily loss of power to the brakes during the transition.

A preliminary root cause analysis can be started using the available information.  The Cause Map can be expanded and revised as necessary as new information becomes available.  Click on the “Download PDF” button above to view the initial Cause Map.

Toyota has not stated whether a formal recall will be made.  A potential recall would affect 300,000 vehicles worldwide.

This new issue comes on the heels of a major announcement on January 21 where 2.3 million cars were recalled because of sticky gas pedals that can cause sudden acceleration. Additionally, there was a recall issued in September 2009 because there was a potential for floor mats to move out of place and cause the accelerator to stick. (A previous blog addressed this issue.)

Toyota shares dropped 21 percent following the January announcement and any farther safety issues will likely negatively impact consumer confident and stock prices.

Tragedy in Bhopal

By ThinkReliability Staff

While researching the tragedy in Bhopal, India, I discovered that there are two theories about what occurred on December 3, 1984 that resulted in a tremendous loss of life. One theory is from a report done by an Engineering Consulting firm hired by Union Carbide (the company that owned the plant in question) that determines that the release was caused by sabotage. Theory #2 is that a combination of inexperienced, ineffective workers and a badly maintained plant with inadequate safety standards that was being ready for dismantling experienced a horribly catastrophic chain of events that ensured that anything that could go wrong, did. For completeness, I have included both in my final Cause Map (which you can see by clicking “Download PDF” above). But for now, I’d just like to focus on the second.

In the wee morning hours of December 3, 1984, over 40 tons (this amount is also debated, but 40 tons appears to be the most popular, purely based on number of references that mention it) of methyl isocyanate (MIC) were released over the community of Bhopal, India, with a population of 900,000. Partially because of the transient nature of the population, and partially due to the general obfuscation of data from all sources involved, the number killed ranges from 2,000 to 15,000. The 2003 annual report of the Madhya Pradesh Gas Relief and Rehabilitation Department stated that a total of 15,248 people had died as a result of the gas leak. Based on claims accepted by the Indian government, there were at least 500,000 injured. This led to what has been called “The World’s Largest Lawsuit”, which I assume refers to the number of people represented, and certainly not the monetary amount of the settlement, which is a paltry $470 million. After the accident, the plant, after a series of legal maneuvers, was abandoned. Extensive cleanup was required, and still has not been completed. The impact to the goals are shown in the outline on the downloadable PDF.

The deaths and environmental impact were caused by the release of over 40 tons of methyl isocyanate (from here on out, we’ll refer to it as MIC). The release occurred when a large volume of MIC was put through an ineffective protection system. The release lasted several hours, because workers were unable to stop it, and because of an ineffective warning system. The release occurred when a disk and valve that led to the protection system burst due to an increase in pressure. The increase in pressure was caused by an increase in temperature resulting from a reaction between MIC and water when the refrigeration system was shut down. There were 41 metric tons of MIC in the tank, stored for use in the plant. How the water was introduced is the debate in the two theories I mentioned above. But regardless, water got in to the tank, either by sabotage or by leaking through a vent line. We will probably never know exactly what happened. But we do know that ineffective safety systems can result in a massive loss of life, as happened here.

Today in History: Fire on the USS Enterprise

By ThinkReliability Staff

On January 13, 1969, 31 years ago, fires and explosions broke out on the USS Enterprise (CVN-65). The crewmembers spent three hours fighting the fire. When the smoke cleared, 27 crewmembers were killed and 314 were injured. Additionally, 15 aircraft were destroyed and the carrier was severely damaged.

We can address the impacts to the U.S. Navy’s goals in a problem outline as the first step of the Cause Mapping process. There was an impact to the safety goal because crewmembers were killed and injured. There was an impact to the property goal because of the 15 planes that were damaged, and the repairs that were required to the ship. (This is also an impact to the labor goal, because of the labor required for the repairs.) Additionally, the ship’s deployment was delayed, which is an impact to both the customer service and production/schedule goals.

After we’ve completed the outline, we build our Cause Map beginning with the goals that were impacted. The goals were impacted by a series of explosions and fires across the ship. These explosions and fires were fueled by jet fuel and bombs that were found on the planes on the flight deck of the carrier. The initiating event was the explosion of a Mk-32 Zuni rocket, which exploded when it overheated due to being put in the exhaust path of an aircraft starting unit.

After the incident, the Navy performed an investigation to review the causes of the incident, and made changes to improve safety. Repairs to the Enterprise were completed, and the ship is now the oldest active serving ship in the U.S. Navy.

A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page. To view the downloadable PDF, click “Download PDF” above.