All posts by ThinkReliability Staff

ThinkReliability are specialists in applying root cause analysis to solve all types of problems. We investigate errors, defects, failures, losses, outages and incidents in a wide variety of industries. Our Cause Mapping analysis method of root causes, captures the complete investigation with the best solutions all in an easy to understand format. ThinkReliability provides investigation services and root cause analysis training to clients around the world and is considered the trusted authority on the subject

113 Killed When a Plane Hit a Hill in Guadeloupe

By ThinkReliability Staff

Flying into a small airport surrounded by mountains at night, in a thunderstorm, with virtually no support from ground equipment proved to be too difficult for even an experienced pilot.

All 113 passengers and crew on Air France Flight 117 were killed when the plane crashed into a hill near the airport in Point-à-Pitre, Guadeloupe on June 22, 1962. The crash occurred in the early morning hours, during a severe thunderstorm.   We can examine the causes of this tragedy in a Cause Map, a visual form of root cause analysis that shows the cause-and-effect relationships that led to an incident  such as this one.  The VHF (very high frequency) omnidirectional range (VOR) indicator, which helps aircraft determine position and stay on course, at the airport in Guadeloupe was not functional.  (It’s not clear if the crew of the Air France flight was aware of this, or how long the equipment had been broken.)  The plane in question was a Boeing 707.

The safety goal was impacted because all people onboard the plane – passengers and crew – were killed.  The plane (valued at $5.5 million) was completely destroyed.  The lack of a working VOR, and the incorrect information provided by the  Automatic direction finder (ADF) can be considered impacts to the customer service goal.  Beginning with the impacted safety goal, we can ask “Why” questions to begin mapping cause-and-effect relationships.   The passengers and crew were killed (and the plane destroyed) when the plane crashed into a hill.

The plane crashed into a hill because the airport was surrounded by mountains, and the plane strayed off the let down track, which it should have used for its approach to the airport.  The pilot went off track because he was using a visual approach, probably due to the fact that the VOR was not providing data since it was not working.   The pilot was unable to see the track due to low (10 km) visibility and since it was early morning (~4 a.m.).  In addition, the plane received incorrect position indication from the ADF, which appeared to malfunction as a result of the severe thunderstorm in the area.

This incident resulted in concern from pilots of substandard landing conditions at certain airports.  More care is now taken with take-off and landing during inclement weather, poor visibility, or conditions that result in landing with decreased equipment support.

To view the Outline and Cause Map, please click “Download PDF” above.

Deadly Sawmill Explosion

By ThinkReliability Staff

An explosion and subsequent fire at a sawmill in British Columbia has killed two workers and injured two dozen more.  Although the cause of the explosion is not known, there have been five explosions linked to wood dust in British Columbia since 2009.

A dust explosion results from the presence of combustible dust, such as that created by the sawmilling process.  In order for an explosion to occur, the dust must be dispersed into the air but confined by a structure in the presence of oxygen and a spark.  (Learn more about dust explosions.) 

To view all the causes that contributed to this tragic explosion, we can examine the incident in a Cause Map, or visual root cause analysis.  We begin with the impacts to the goals. The employee deaths and injuries are an impact to the safety goal.  This is the primary focus of any issue that results in human death or injury.  In addition, the environmental goal was impacted as the smoke migrated to the nearby town.  The production goal was impacted due to the shutdown of the facility.  The property goal was impacted due to destruction of the sawmill, log processing facility, and sorting facility.  Lastly, the investigation and cleanup will impact labor goals.

Once we have determined the impacts to the goals, we can ask why questions to determine the cause-and-effect relationships that led to the incident.  In this case, the injuries were due to the fire.  The fire may have been caused by a dust explosion (explosion due to natural gas leak has been ruled out).  In order for a dust explosion to occur, five factors are necessary: 1) presence of combustible dust, 2) oxygen, 3) dust is dispersed into the air, 4) dust particles are confined, and 5) the mixture is ignited.

In this case, the ignition source is not known and, due to the damage at the facility, may never be conclusively determined.  Similarly, the cause that resulted in the dust being dispersed may also not be known.  The oxygen must be present for worker safety and the dust is confined because it is held within a closed structure.  The dust is present because it is created during sawmilling operations.  What makes a dust combustible depends on the properties of the dust.  This mill was processing pine beetle wood, or wood that was ravaged by beetles.  This makes the wood drier, which results in a drier, finer, more combustible dust.  Thorough cleaning of any facility that creates potentially combustible dust is a necessity – inadequate cleaning (including dust that may gather on hard-to-access surfaces, such as the ceiling) increases the possibility of an explosion.  The union believes that cleaning has been reduced as a result of the economy.

Local government has begun inspections of saw mills but are asking plants to examine potential dust and ignition sources. Reducing dust and ignition sources are the most effective way to reduce risk of dust explosions.  Other solutions being considered include adding water to the air to increase humidity and increased ventilation, which can reduce the confinement of the dust and increase cleanliness.

To view the Outline and Cause Map, please click “Download PDF” above.

 

Siberian Plane Crash

By ThinkReliability Staff

Four minutes after take-off on April 1, 2011, an ATR-72 crashed just past Roshchino International Airport in Tyumen, Siberia.  This type of plane has had previous issues with dealing with ice, and has been banned from flying in conditions likely to result in icing in the United States.  However, it has not yet been determined that ice was related to the crash.

To begin a Cause Map – an intuitive, visual root cause analysis – we look at the impacted goals.  In this case, the fatalities and injuries are the primary impact, to the safety goal.  Additionally, this incident, combined with previous air safety issues in Russia (such as the September 2011 crash that killed a Russian hockey team), have eroded public confidence in air safety in the country.  This could be considered an impact to the customer service and production goal.  The plane split into three pieces on impact, which affects the property goal.   Searches and subsequent investigations will likely impact the labor goal.

Once the impacts to the goals have been determined, begin the Cause Map with these impacted goals, and ask “Why” questions.  More detail can be added as the investigation progresses.  In this case, the fatalities and injuries were likely caused by the plane’s impact with the ground.  Other mechanical issues are still a possibility; however, the crew did not report any malfunctions prior to the crash.  Disruption of air flow over the wings and jamming of ailerons can be caused by accumulation of ice on the plane.  It has been determined that there was inadequate de-icing agent on the plane, either because it was not applied (according to the deputy head of the airport where the plane took off) or was not applied properly (according to the head of the Russian air transportation agency).  It is known that the weather was cold (the plane landed in a snowy field) and that ATR-72s have trouble with icy conditions, to the point where they have been banned from flying in conditions likely to cause ice in the US.

Officials aren’t ready to name the icing issues as a cause of the crash.  Further investigation will determine which causes did contribute.  In the meantime, all the information that is known can be captured on a Cause Map.  Causes can then be added – or crossed off – as more information becomes available.

To view the Outline and Cause Map, please click “Download PDF” above.

Combination of Gas Leak and Flare Could be Disastrous

By ThinkReliability Staff

A leak from the Elgin platform in the north sea near Aberdeen has the potential to cause an explosion due to the proximity of the leak to the still-lit flare on the platform.  However, the wind is currently blowing gas away from the flare.  The potential for environmental damage is not as great as that of Deepwater Horizon because it is a surface, rather than underwater, leak.

Workers on the now-evacuated Elgin rig noticed the leak on March 25, 2012.  The rig was partially, then later fully, evacuated.  We can examine the causes of the environmental leak, as well as the potential for further damage, in a visual root cause analysis in the form of a Cause Map.  The Cause Map lays out the cause-and-effect relationships in a clear, intuitive way.

We begin with the impacts to the goals.  The safety goal is impacted because of the potential for an explosion.  The environmental goal is impacted due to the gas leak, estimated to be approximately 200 cubic metres per day.  The customer service goal is impacted due to the loss of value of the owner corporation stock shares.  Production is currently shut down on the rig, leading  to an impact to the production goal.  The potential for an explosion could also cause catastrophic damage to the platform, which is an impact to the property goal.  Lastly, the evacuation of the platform is an impact to the labor goal.

In order for an explosion to occur, there must be fuel, oxygen, heat and confinement.  In this case, the oxygen is provided by the atmosphere, and the confinement is provided by the well itself.  The fuel is provided by the gas leak, believed to be entering from another non-producing well through a crack in the outer casing of the well, which was in the process of being plugged and abandoned.    The heat likely to cause the explosion is a flare on the platform.  The flare burns off excess gas from the platform and was not extinguished during the evacuation, as the priority was to remove the workers.

The flare is unable to be turned off remotely, but options for extinguishing the flare are being evaluated.  Other options being evaluated to stop the leak and reduce the potential for explosion include digging a relief well or killing the well that is currently leaking.  All options have the potential to be very expensive.

To view the Outline and Cause Map, please click “Download PDF” above.

Honduran Prison Fire

By ThinkReliability Staff

How do you know when your solutions haven’t been effective?  When the same problem keeps happening.  Another prison fire claimed 360 lives in Honduras.  This is the third fatal prison fire in nine years, resulting from  chronic overcrowding and understaffing of Honduran jails.

Just more than 3 years since over 100 prisoners were killed in a prison fire  in San Pedro Sula (see previous blog), 360 prisoners (so far) have died as a result of a fire in Comayagua Prison.  (A fire in 2003 claimed the lives of 68 prisoners.)  An open flame has been determined to be the cause of the fire but contributing to the deaths is that the prisoners were unable to get out.

With any incident resulting in deaths of this magnitude, we can analyze the causes of the incident using a visual root cause analysis, or Cause Map.  We begin with the impacts to the goals.  In this case, the prisoner deaths were an impact to the safety goal.  In addition, prison overcrowding can be considered an impact to the production goal, and a delay in rescue can be considered an impact to the customer service.  Any damage resulting to the prison itself as  a result of the fire is an impact to the property goal.  Once we’ve determined the goals that were impacted, we can begin the analysis by asking “why” questions.

An investigation determined that an open flame (such as a cigarette or candle) and not arson, as was suggested prior to the investigation, caused the fire.  However, severe overcrowding (more than 800 prisoners were in a jail with a capacity of 500) and a delay in the rescue of the prisoners contributed to the massive death toll.

Honduras has a chronic overcrowding problem.  Honduras has a high rate of homicides and a high number of gang members.  Gang members receive strict sentences and, in many cases, are jailed prior to conviction.  However, an increased number of  inmates has not led to an increased number of guards.  On the night of the fire, there were 6 guards on duty.  Guards who were in the towers were not allowed to leave their posts to help with the fire-fighting and rescue efforts.  The guard who had the only set of keys fled prior to unlocking the doors.  (The guards are facing disciplinary actions.)  Firefighters were not allowed to enter the jail for 30 minutes after the fire call as the guards believed they were experiencing a riot or breakout.  An inmate who was not in his cell at the time of the fire was able to free many prisoners.

This incident has added more fuel to the international outcry over the state of Honduras prisons.   However, not much appears to have been done to improve conditions since the previous fires in 2003 and 2009, so it’s unclear if anything will change as a result of this fire.   It is certainly apparent that the safety of prisoners cannot be maintained with the current overcrowding and number of guards.  Additionally, procedures in the case of a fire certainly need to be improved to ensure that prisoners can be evacuated safely and securely.

To view the Outline and Cause Map, please click “Download PDF” above.

Prison Fire Kills 103 in 2009

By Staff

On February 9, 2009, a fire and explosion in a seriously overcrowded prison in Honduras resulted in 103 deaths and 25 injuries.  The fire was started from a short circuit from a overheated refrigerator motor, used to store soft drinks for the inmates.  The cell block – which has a capacity of 800 – contained 1960 inmates, their clothing, and their bedding materials.  This provided plenty of fuel for the fire.

We can look at the causes that led to the prisoner deaths in a Cause Map, or visual root cause analysis.  We begin with the impacts to the goals.  The deaths and injuries of prisoners are an impact to the safety goal.  The environmental goal was impacted by the severe prison fire and explosion.  The customer service goal (considering the general population as the “customer” of a government-run prison) was unaffected, as there were no prisoner escapes.  Finally, the property goal was impacted due to damage to the prison.

We can continue the Cause Map by asking “why” questions.  The impacts to the goals were due to a severe prison fire and explosion.  In addition to the fire, the injuries to the prisoners was caused by the prisoners being unable to escape.  Part of the reason the prisoners were unable to escape is because they are in prison, and so precautions against escape are part of the deal.  However, egress from a building that is on fire to a safe location should be part of the procedures of any prison.  In this case, the procedures obviously didn’t work considering the high amount of deaths and injuries (of a total of 186 prisoners in this cell block).  The egress was likely made more   difficult due to severe prison overcrowding.  The prison has a capacity of 800 and contained 1,960 prisoners.  The increase in the prison population is at least partially due to a legislation passed the previous August which mandated a minimum 12-year prison term for gang members.  There are estimated to be more than 100,000 gang members in Honduras.

The heat for the fire was provided by an overheating refrigerator motor.  The fuel was provided by large amounts of clothing and bedding materials – more than usual, due to the prison overcrowding.

Once the causes for the impacted goals have been determined, solutions can be brainstormed.  In this case, prisoner advocates have been long calling for alternatives to jail sentences for gang members.  This would, of course, reduce the prison population.  Another option to reduce prison overcrowding would be to build more prisons.  To reduce the risk of fire, motorized equipment should be kept away from flammable objects, like clothing and bedding.  Last but not least, any facility has to have an effective egress plan in the case of fire or other emergencies.  These procedures are especially important in the case of a prison, where the potential of prisoner escape has to be considered as well as prisoner safety.

To view the root cause analysis investigation, please click “Download PDF” above.  Or click here to read more.

1960 Plane Collision over NYC killed 134

By ThinkReliability Staff

On December 16, 1960, two planes collided about a mile above Brooklyn, New York.  One plane – United Airlines Flight 826 – was in a holding pattern preparing to descend into Idlewild (now John F. Kennedy International) Airport.  The other plane – TWA Flight 266 – was preparing to descend into LaGuardia.  Since both airports serve New York City, they are in fairly close proximity.  The planes, too, were in close proximity – too close, leading to their collision.  In addition to the 84 passengers killed on the United flight (though one would survive for a day) and the 44 passengers killed on the TWA flight, 6 people were killed in the neighborhood of Park Slope, where the United plane landed.

This incident can be outlined in a Cause Map or visual root cause analysis.  We begin with determining the impacted goals.  First, the 134 total deaths were an impact to the safety goal.  The United flight crash resulted in a fire that affected more than 200 buildings, an impact to both the environmental and property goal.   The liability for the crash was assigned to both airlines and the government, an impact to the customer service goal.  There was another impact to the property goal because both planes were destroyed.  Lastly, the labor goal was impacted due to the rescue efforts of the more than 2,500 personnel who responded to the two crash sites.

These impacts to the goals occurred when both planes crashed after colliding.  The planes collided after their flight paths brought them into too close of proximity.  The United flight was estimated to be 12 miles outside its holding  pattern when the crash occurred, possibly because the ground beacon was not working.  The controllers at Idlewild were unaware of the plane’s position as planes were not tracked in holding patterns as it was too difficult to identify individual planes.  The planes were unaware of each other.  The visibility was extremely poor due to foggy, cloudy, sleety and snowy weather.  The United plane had lost the ability to use their instruments due to a loss of a receiver.  (The cause is unknown.)  Additionally, the controllers at LaGuardia (who were guiding in the TWA flight) were unable to reach the TWA plane to warn them of the close proximity of the United plane.

Although comprehensive details are not known about the crash, much of the information used to put together the investigation was obtained from the flight recorder (or “black box”).  This is now a main source of data in aviation accident investigations.  The evidence in this case was used to divide up liability for the accident very exactly – 61% to United Airlines, 24% to the US government and the remainder to TWA.

To view the Outline and Cause Map, please click “Download PDF” above.

Fatal Cruise Ship Accident

By ThinkReliability Staff

At least 11 people have been killed – with 24 still missing – after the cruise ship Costa Concordia ran aground on rocks near the island of Giglio, Italy.  The ship was taken  manually up to 4 miles off course on a route not  authorized by the company.

This incident can be thoroughly examined in a visual root cause analysis built as a Cause Map.  First, we examine the impacts to the goals for this incident.  The confirmed deaths and missing people are a significant impact to the safety goal.   Additionally, the environmental goal is impacted because of the potential for a spill of the 500,000 gallons of fuel still onboard.  The required evacuation of the ship can be considered a customer impact goal.  The loss of use of the ship – estimated to be $85 to $95 million for lost usage in the next year and the decrease in bookings due to concern over the incident can be considered an impact to the production/schedule goal.  The damage to the ship, which was recently built and insured for approximately $575 million, is an impact to the property goal and the rescue and recovery efforts are an impact to the labor goal.

Once we have these impacts to the goals, we can begin an analysis by asking “why” questions.  The impact to the safety goal – dead and missing passengers and crew – were caused by the ship running aground on rocks and  some issues with the evacuation process.  The ship ran aground on rocks because it got too close to the island in a manually programmed unauthorized deviation of the ship’s route, potentially to provide passengers with a better view.  This deviation in route, sometimes called a “fly by”, had been previously authorized by the company.  No crew members questioned the change in route by the Captain, noting that onboard he is solely responsible for the ship.  (Note that with great power comes great responsibility, and the Captain has been charged with manslaughter.)   Although the ship contains alarms meant to warn the crew when the ship goes off-course, these alarms are deactivated when the ship route is manually altered.

There were some issues with the evacuation of the ship, though as the company notes, not due to the evacuation procedure, which was externally reviewed in November.  Rather the issues were caused by the severe list of the ship (it was leaning almost completely to one side), which affects the ability to use the lifeboats.  Additionally, some of the passengers (who had just come aboard) had not yet completed a lifeboat drill.  The drill is required to be performed within 24 hours of boarding the ship and was scheduled for the morning after departure. The grounding occurred just 3.5 hours after departure.

Currently, rescue and recovery efforts continue.  Attempts are being made to remove fuel from the ship, which is in a protected area.  Concern about cruise ships in the area have previously been raised, with some wanting to limit ships that are allowed in the area.  Additionally, both the cruise ship company and the government are reconsidering the timing of lifeboat drills in order to ensure the best results for passengers in issues like these.

To view the Root Cause Analysis investigation, please click “Download PDF” above

Radioactive Release in the 1960s due to Inadvertent Dropping of Nuclear Weapons

By ThinkReliability Staff

In the history of nuclear weapons in the U.S., two accidents (or inadvertent drops) of nuclear weapons have resulted in widespread dispersal of nuclear materials.  These two incidents occurred two years apart, within a week.  The incidents had many similarities: in both cases, a B-52 bomber carrying nuclear weapons was damaged in air during an airborne alert mission and released nuclear weapons, which released radioactive material over a large area.  In both cases, there were significant impacts to the safety, environmental, customer service, property and labor goals.

Palomares: On January 17, 1966, a B-52 and KC-135 crashed during refueling above Palomares, Spain.  The KC 135 exploded, killing the entire crew of four.   The B-52 broke up mid-air, killing three crew members (four more were able to eject) and releasing four nuclear weapons.  Two of the weapons’ parachutes failed, and the weapons were destroyed, releasing radioactive material causing extensive cleanup of the 1,400 contaminated tons of soil and debris.  (Additionally, one of the intact bombs fell into the ocean and was not recovered for three months.) This was the third refuel of the mission and it’s unclear what exactly went wrong, though due to the close proximity required, mid-air refueling is extremely risky.

Thule: A fire began in a B-52 when flammable cushions were stuffed under a seat, covering the heat duct.  Hot air from the engine manifold was redirected into the cabin in an attempt to warm it up, which ignited the cushions.  The crew of the B-52 was unable to extinguish the fire and the pilot lost instrument visibility.  The generators failed (for reasons that aren’t clear), cutting all engine power.  The crew bailed, the plane crashed, and the two weapons were destroyed along with the plane, again releasing radioactive material that led to a four-month cleanup mission.

The causes of these two incidents have one thing in common – both resulted from planes carrying nuclear weapons as part of an airborne alert mission.  Although many safeguards were taken due to the high risk of the missions, extremely serious impacts still resulted.  Thus the decision was made to cancel airborne alert missions.  When the risk is too high, sometimes the only solution is to end the situation resulting in the risk.

We can look at these two incidents together in a Cause Map, or visual root cause analysis.  To view the Outlines,  Timeline and Cause Maps in a three-page downloadable PDF, please click “Download PDF” above.  Or click here to read more.

Plane Crash Kills Hockey Team

By ThinkReliability Staff

Hockey fans were devastated when, on September 7, 2011, a Yak-42 plane carrying a Russian hockey team, including many former NHL players, crashed shortly after takeoff.  A total of 44 people were killed, including 36 passengers and 8 crew members.  One crew member survived the crash.  This incident was the 7th fatal crash to occur in Russia since June, and resulted in the loss of the license of the company who operated the plane.

Now that the Russian air safety organization has released results from its investigation, we can map the details of the crash into a Cause Map, or visual root cause analysis. The Cause Map begins with the impacts to the goals.  The deaths of the crew and passengers are an impact to the safety goal.  The company losing its operating license can be considered an impact to the organizational goal.  The damage to the plane is an impact to the property goal.  All these impacts to the goals were caused by the plane crashing into a riverbank shortly after takeoff.

We ask “Why” questions to add more detail to the map.  It has been determined that the plane crashed because it had insufficient speed during takeoff, and the takeoff was not aborted.  It is also possible that the pilot was attempting to emergency land in the river, and missed.  The plane had insufficient speed during takeoff because the brake was pressed.  Studies determined that a foot had to be placed on the brake pedal in order for the brake to be activated.  Because of the force being used on the control column, it is likely that one of the pilots was attempting to push down using his foot as a brace.  The pilots who were flying the plane were more familiar (and were being trained simultaneously on) another type of plane.  This plane – the Yak-40 – has a foot rest where the Yak-42’s brake pedal is located.  Normally pilots are only trained on one type of plane at a time to minimize this sort of confusion.

In addition, at some point during takeoff, the engine was idled.  This would normally indicate that takeoff is being aborted.  Once the engine was brought back into service, it took some time to regain takeoff power – and the speed had already dropped.  Aviation experts say that takeoff could have been aborted and the crash would have been avoided.  However, it does not appear that an abort attempt was made.  Flight recordings indicate confusion and a lack of effective communication in the cockpit.  Prior to the engine being idled, one of the pilots pushed the control stick forward, after which it was pulled back to resume takeoff.  The crew on this plane had never trained together before which is fairly typical, and may be part of the reason for the recent poor safety record of planes in Russia.  Additionally, the pilot had Phenobarbital in his system, which is known to slow reaction time.  Recommendations to attempt to improve the safety of small planes of regional carriers in Russia have been under consideration with the recent rash of crashes.  However, the loss of many popular hockey players may increase the urging to implement these solutions.

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more.