Tag Archives: root cause analysis

Cruise Ship Loses Power

By Kim Smiley

Part of the excitement involved in passenger cruises is access to remote areas of the world.  However, when a ship runs in to trouble, that remoteness can result in extremely difficult conditions.  This was the case on the Costa Allegra, which suffered an engine room fire in the Indian Ocean.

Passengers aboard the Costa Allegra experienced sub-standard conditions when the ship lost power and propulsion due to an engine room fire.  During the three days while the ship was being towed to land, there was no air conditioning, lighting, or running water.  Food and drinking water were provided by helicopter.

We can examine the causes and effects of this issue in a Cause Map, or visual root cause analysis.  With the Cause Mapping process, we begin by examining the impact to the goals.  Namely, when the incident occurred, which of the organization’s goals were not met?  In this case, although there appeared to be no injuries resulting from the fire itself (although some passengers may have become ill during the resulting conditions) there was the potential for severe injury resulting from the fire and then the lack of power that resulted.   Additionally, the customer service goal was impacted by the lack of running water, air conditioning, and lighting.  The schedule goal was impacted because the ship needed to be towed for 3 days.  The property goal was impacted due to the damage to the ship from the fire, and the labor goal was impacted due to the need for the ship’s crew to stand guard against pirate attack.

Once we’ve determined the goals that were impacted, we can use them as a basis for our map, and ask “Why” questions to add more detail.   Here, an engine room fire on the ship resulted in the loss of ship power, causing the loss of air conditioning, lighting and running water, and the loss of ability for the ship to propel itself, necessitating a tow.  The length of the tow is also affected by the type of ship doing the towing.  In this case, the first ship to arrive to the aid of the Costa Allegra was a fishing vessel.   Although tugboats later arrived, the Costa Allegra requested that the fishing vessel continue the tow, although it is believed that the tugboats would have been able to speed up the tow, possibly resulting in the ship arriving as much as 12 hours earlier.  The cruise ship company has stated that the tow was not changed in consideration of the consistency of the voyage for the passengers but there are also potentially financial considerations.  Assistance to people at sea is not paid, but assistance to ships is.  Thus, the fishing vessel actually entered into a contract with the cruise ship for the tow.

Part of the reason that a fishing vessel was the first to arrive is that there is little maritime traffic in the area.  This is due to the remoteness of the area in which the cruise ship was traveling, as well as the risk of piracy.  This, of course, led to a constant armed guard on the disabled ship to protect from potential pirate attack.

The location to which the ship was towed also impacts the length of the tow.  It was determined that smaller ports closer to the location of the disabled Costa Allegra could not accommodate the large number of passengers on the ship, so the ship was towed to an island of Seychelles.

The cause of the fire itself is still under investigation, although it is believed that an electrical fault is a likely cause and that arson is not likely.  As more information becomes available, we can add that information to the Cause Map as well.

To view the Outline and Cause Map, please click “Download PDF” above.

Honduran Prison Fire

By ThinkReliability Staff

How do you know when your solutions haven’t been effective?  When the same problem keeps happening.  Another prison fire claimed 360 lives in Honduras.  This is the third fatal prison fire in nine years, resulting from  chronic overcrowding and understaffing of Honduran jails.

Just more than 3 years since over 100 prisoners were killed in a prison fire  in San Pedro Sula (see previous blog), 360 prisoners (so far) have died as a result of a fire in Comayagua Prison.  (A fire in 2003 claimed the lives of 68 prisoners.)  An open flame has been determined to be the cause of the fire but contributing to the deaths is that the prisoners were unable to get out.

With any incident resulting in deaths of this magnitude, we can analyze the causes of the incident using a visual root cause analysis, or Cause Map.  We begin with the impacts to the goals.  In this case, the prisoner deaths were an impact to the safety goal.  In addition, prison overcrowding can be considered an impact to the production goal, and a delay in rescue can be considered an impact to the customer service.  Any damage resulting to the prison itself as  a result of the fire is an impact to the property goal.  Once we’ve determined the goals that were impacted, we can begin the analysis by asking “why” questions.

An investigation determined that an open flame (such as a cigarette or candle) and not arson, as was suggested prior to the investigation, caused the fire.  However, severe overcrowding (more than 800 prisoners were in a jail with a capacity of 500) and a delay in the rescue of the prisoners contributed to the massive death toll.

Honduras has a chronic overcrowding problem.  Honduras has a high rate of homicides and a high number of gang members.  Gang members receive strict sentences and, in many cases, are jailed prior to conviction.  However, an increased number of  inmates has not led to an increased number of guards.  On the night of the fire, there were 6 guards on duty.  Guards who were in the towers were not allowed to leave their posts to help with the fire-fighting and rescue efforts.  The guard who had the only set of keys fled prior to unlocking the doors.  (The guards are facing disciplinary actions.)  Firefighters were not allowed to enter the jail for 30 minutes after the fire call as the guards believed they were experiencing a riot or breakout.  An inmate who was not in his cell at the time of the fire was able to free many prisoners.

This incident has added more fuel to the international outcry over the state of Honduras prisons.   However, not much appears to have been done to improve conditions since the previous fires in 2003 and 2009, so it’s unclear if anything will change as a result of this fire.   It is certainly apparent that the safety of prisoners cannot be maintained with the current overcrowding and number of guards.  Additionally, procedures in the case of a fire certainly need to be improved to ensure that prisoners can be evacuated safely and securely.

To view the Outline and Cause Map, please click “Download PDF” above.

Collapse of Gulf of Maine Cod Population Feared

By Kim Smiley

Recent estimates of the Gulf of Maine cod population show that cod is being over fished to the point that the population is at risk of collapse, meaning the numbers become so low the population cannot recover.  Federal regulators are trying to determine the best course of action to protect the fish population which may include severely restricting cod fishing in the Gulf of Maine.

The declining cod population problem can be analyzed by building a Cause Map, an intuitive, visual root cause analysis.  The first step in building a Cause Map is to determine what goals are impacted by the issue.  In this case, the environmental goal is impacted because the cod population may collapse, but the economic impacts of this issue are also a major concern.

Cod has long been a major source of income for New England fishermen, bringing in $15.8 million in 2010.  Restricting cod fishing would also impact the ability to catch other fish because cod is often also bought up in nets when other fish are targeted.  Cod are bottom swimmers along with other popular fish such as flounder and haddock and it’s impossible to catch one type of fish without catching the others.

The cod population is declining because the fish are not reproducing fast enough to keep up with fishing. Fishing of cod occurs for several reasons.    First, cod is caught and sold because it is profitable.  Cod meat is high in protein, low in fat and easily filleted. Additionally, federal regulations allows fisherman to catch a set quota of cod.  One of the potential causes of the declining cod population may be that these quotas are set too high to for the cod population to continue to grow.

The federal limits on cod fishing over the past few years were set based on information from 2008 that showed a significantly higher cod population than the estimates determined by the recent population assessment.  It’s not clear why the numbers of cod varied so dramatically between the current estimates and the ones from 2008, but the dramatic swing in fish population estimates has been a source of many complaints by fishermen.  It may also be worth considering whether any environmental factors have impacted the fish population.  Cod population can be affect by many factors besides fishing, such as varying ocean temperatures or changes in their food supply.

After considering severe cuts of up to 82 percent, federal regulators appear to be willing to reduce the amount of cod allowed to be caught by only 22 percent for the 2012 fishing season.  This is only a one year agreement and fishermen will likely face severe cuts on cod fishing limits again in 2013.   At this time it’s not clear whether there is a way to save the historic fishing industry in the Gulf of Maine and ensure a healthy population of cod in the region.

To view a high level Cause Map of this issue, click “Download PDF” above.

Several Incidents at CA Nuclear Plant Raise Concerns

By Kim Smiley

Within a week, three separate incidents occurred at the San Onofre Nuclear Generating Station, located near heavily populated areas, raising new concerns about the safety of the nuclear power plant.

This issue can be investigated by building a Cause Map, an intuitive, visual root cause analysis.  The first step in building a Cause Map is to determine what goals are impacted by the issue being considered.  In this case, the main goal being considered is safety.  If the Cause Map was being built from the perspective of the power plant company, then the production and schedule impacts would also need to be considered, but in this example we will focus on the safety impacts.

The safety goal is impacted because some people are concerned about the safety of the power plant because it is near heavily populated areas and three separate incidents occurred within days of each other.  The three incidents in question were the release of a small amount of radiation, discovery of unexpected amounts of wear on steam generator tubes, and the potential contamination of a worker.

A small amount of radiation was released because a steam generator tube, which carries radioactive water, was leaking.  Luckily, the leak was small and the plant was quickly shut down after the leak was discovered so no significant amounts of radiation were released.  A second reactor unit is currently shut down for maintenance and inspection of the steam generator tubes found significantly more wear than expected on some of the tubes.  The wear was unexpected because the tubes have only been in service for 22 months and two tubes had 30% wall thinning, 69 tubes had 20% wall thinning and 800 had 10% wall thinning.  The situation is being investigated, but neither the cause of the wear nor the best course of action has not yet been determined.  The final incident was the potential contamination of a worker because he fell into a reactor pool.  According to media reports, the worker was trying to retrieve a flash light and lost his footing.

To view a high level Cause Map of this incident, click “Download PDF” above.  The Cause Map can be expanded as more information comes available so that it can document and illustrate as much detail as needed to evaluate the issues.

As it stands, both the reactor units with the steam generator tubes are shut down.  The unit that experienced the leak is shutdown pending investigation and any necessary repairs.  The second unit that had the unexpected wall thinning in the steam generator tubes is in a planned shutdown of several months while it is refueled and upgraded.  The plants will be brought back online once it’s determined safe to do so.

Prison Fire Kills 103 in 2009

By Staff

On February 9, 2009, a fire and explosion in a seriously overcrowded prison in Honduras resulted in 103 deaths and 25 injuries.  The fire was started from a short circuit from a overheated refrigerator motor, used to store soft drinks for the inmates.  The cell block – which has a capacity of 800 – contained 1960 inmates, their clothing, and their bedding materials.  This provided plenty of fuel for the fire.

We can look at the causes that led to the prisoner deaths in a Cause Map, or visual root cause analysis.  We begin with the impacts to the goals.  The deaths and injuries of prisoners are an impact to the safety goal.  The environmental goal was impacted by the severe prison fire and explosion.  The customer service goal (considering the general population as the “customer” of a government-run prison) was unaffected, as there were no prisoner escapes.  Finally, the property goal was impacted due to damage to the prison.

We can continue the Cause Map by asking “why” questions.  The impacts to the goals were due to a severe prison fire and explosion.  In addition to the fire, the injuries to the prisoners was caused by the prisoners being unable to escape.  Part of the reason the prisoners were unable to escape is because they are in prison, and so precautions against escape are part of the deal.  However, egress from a building that is on fire to a safe location should be part of the procedures of any prison.  In this case, the procedures obviously didn’t work considering the high amount of deaths and injuries (of a total of 186 prisoners in this cell block).  The egress was likely made more   difficult due to severe prison overcrowding.  The prison has a capacity of 800 and contained 1,960 prisoners.  The increase in the prison population is at least partially due to a legislation passed the previous August which mandated a minimum 12-year prison term for gang members.  There are estimated to be more than 100,000 gang members in Honduras.

The heat for the fire was provided by an overheating refrigerator motor.  The fuel was provided by large amounts of clothing and bedding materials – more than usual, due to the prison overcrowding.

Once the causes for the impacted goals have been determined, solutions can be brainstormed.  In this case, prisoner advocates have been long calling for alternatives to jail sentences for gang members.  This would, of course, reduce the prison population.  Another option to reduce prison overcrowding would be to build more prisons.  To reduce the risk of fire, motorized equipment should be kept away from flammable objects, like clothing and bedding.  Last but not least, any facility has to have an effective egress plan in the case of fire or other emergencies.  These procedures are especially important in the case of a prison, where the potential of prisoner escape has to be considered as well as prisoner safety.

To view the root cause analysis investigation, please click “Download PDF” above.  Or click here to read more.

1960 Plane Collision over NYC killed 134

By ThinkReliability Staff

On December 16, 1960, two planes collided about a mile above Brooklyn, New York.  One plane – United Airlines Flight 826 – was in a holding pattern preparing to descend into Idlewild (now John F. Kennedy International) Airport.  The other plane – TWA Flight 266 – was preparing to descend into LaGuardia.  Since both airports serve New York City, they are in fairly close proximity.  The planes, too, were in close proximity – too close, leading to their collision.  In addition to the 84 passengers killed on the United flight (though one would survive for a day) and the 44 passengers killed on the TWA flight, 6 people were killed in the neighborhood of Park Slope, where the United plane landed.

This incident can be outlined in a Cause Map or visual root cause analysis.  We begin with determining the impacted goals.  First, the 134 total deaths were an impact to the safety goal.  The United flight crash resulted in a fire that affected more than 200 buildings, an impact to both the environmental and property goal.   The liability for the crash was assigned to both airlines and the government, an impact to the customer service goal.  There was another impact to the property goal because both planes were destroyed.  Lastly, the labor goal was impacted due to the rescue efforts of the more than 2,500 personnel who responded to the two crash sites.

These impacts to the goals occurred when both planes crashed after colliding.  The planes collided after their flight paths brought them into too close of proximity.  The United flight was estimated to be 12 miles outside its holding  pattern when the crash occurred, possibly because the ground beacon was not working.  The controllers at Idlewild were unaware of the plane’s position as planes were not tracked in holding patterns as it was too difficult to identify individual planes.  The planes were unaware of each other.  The visibility was extremely poor due to foggy, cloudy, sleety and snowy weather.  The United plane had lost the ability to use their instruments due to a loss of a receiver.  (The cause is unknown.)  Additionally, the controllers at LaGuardia (who were guiding in the TWA flight) were unable to reach the TWA plane to warn them of the close proximity of the United plane.

Although comprehensive details are not known about the crash, much of the information used to put together the investigation was obtained from the flight recorder (or “black box”).  This is now a main source of data in aviation accident investigations.  The evidence in this case was used to divide up liability for the accident very exactly – 61% to United Airlines, 24% to the US government and the remainder to TWA.

To view the Outline and Cause Map, please click “Download PDF” above.

Fatal Cruise Ship Accident

By ThinkReliability Staff

At least 11 people have been killed – with 24 still missing – after the cruise ship Costa Concordia ran aground on rocks near the island of Giglio, Italy.  The ship was taken  manually up to 4 miles off course on a route not  authorized by the company.

This incident can be thoroughly examined in a visual root cause analysis built as a Cause Map.  First, we examine the impacts to the goals for this incident.  The confirmed deaths and missing people are a significant impact to the safety goal.   Additionally, the environmental goal is impacted because of the potential for a spill of the 500,000 gallons of fuel still onboard.  The required evacuation of the ship can be considered a customer impact goal.  The loss of use of the ship – estimated to be $85 to $95 million for lost usage in the next year and the decrease in bookings due to concern over the incident can be considered an impact to the production/schedule goal.  The damage to the ship, which was recently built and insured for approximately $575 million, is an impact to the property goal and the rescue and recovery efforts are an impact to the labor goal.

Once we have these impacts to the goals, we can begin an analysis by asking “why” questions.  The impact to the safety goal – dead and missing passengers and crew – were caused by the ship running aground on rocks and  some issues with the evacuation process.  The ship ran aground on rocks because it got too close to the island in a manually programmed unauthorized deviation of the ship’s route, potentially to provide passengers with a better view.  This deviation in route, sometimes called a “fly by”, had been previously authorized by the company.  No crew members questioned the change in route by the Captain, noting that onboard he is solely responsible for the ship.  (Note that with great power comes great responsibility, and the Captain has been charged with manslaughter.)   Although the ship contains alarms meant to warn the crew when the ship goes off-course, these alarms are deactivated when the ship route is manually altered.

There were some issues with the evacuation of the ship, though as the company notes, not due to the evacuation procedure, which was externally reviewed in November.  Rather the issues were caused by the severe list of the ship (it was leaning almost completely to one side), which affects the ability to use the lifeboats.  Additionally, some of the passengers (who had just come aboard) had not yet completed a lifeboat drill.  The drill is required to be performed within 24 hours of boarding the ship and was scheduled for the morning after departure. The grounding occurred just 3.5 hours after departure.

Currently, rescue and recovery efforts continue.  Attempts are being made to remove fuel from the ship, which is in a protected area.  Concern about cruise ships in the area have previously been raised, with some wanting to limit ships that are allowed in the area.  Additionally, both the cruise ship company and the government are reconsidering the timing of lifeboat drills in order to ensure the best results for passengers in issues like these.

To view the Root Cause Analysis investigation, please click “Download PDF” above

Number of Poached Rhinos Hits All Time High

By Kim Smiley

Rhinoceros, commonly called rhinos, have long been hunted for their horns.  Three of the five species of rhinos are considered critically endangered.  According to the National Geographic News Watch, at least 443 rhinos were killed in South Africa in 2011, a significant increase from 333 the previous year.  South Africa is home to more than 20,000 rhinos, which is over 90% of the rhinos in Africa.  For a little perspective on how significantly the problem has grown, South Africa only lost about 15 rhinos a year a decade ago.

Experts in the field have concluded that the number of rhinos lost through unnatural means, both illegal poaching and the less common legal hunts allowed by the government, will result in a decline in the population of rhinos.

This problem can be investigated by building a Cause Map, an intuitive, visual root cause analysis method.  To begin a Cause Map, the impact to the organizational goals is first determined and then “why” questions are asked to add Causes to the map.  In this example, the major organizational goal being considered is the impact to the environmental.  The environmental goal is impacted because the poaching of rhinos hit an all time high.  This happened because of two things, poachers want to hunt rhinos and the methods in place to prevent poaching are ineffective.

Poachers want to hunt rhinos because the black market value of their horns is extremely high.  They are worth more than gold by weight.  Poachers are able to sell the horns for high prices because consumers are both willing and able to pay huge sums.  There is a strong market for rhino horn because of long standing beliefs that rhino horn has medicinal uses, primarily in Asian cultures.  The number of people able to come up with large amounts of money has also increased with the rise of an affluent middle class in many Asian countries.

The poaching is also increasing because it’s very difficult to prevent it.  The rhinos live in a large, wild habitat.  It’s simply difficult and expensive to patrol and defend such a large region.  The poachers are very well armed because they are backed by international crime syndicates with deep pockets.  It’s a huge challenge for the governments involved to prevent the poaching from occurring.

This problem will likely continue to increase until the demand for the rhino horns starts to decrease.  Modern medical research has concluded that rhino horn has no medicinal value, but as long as people are willing to pay big money for them, someone will find a way to meet that demand.

As an interesting aside, theft of rhino horns from museums has also risen dramatically.  At least 30 horns were stolen from museums this past year.  Click here to learn more.

Roofing Asphalt Spilled on PA Turnpike

By Kim Smiley

On November 22, 2011, a tanker truck spilled a large quantity of roofing asphalt along nearly 40 miles of the Pennsylvania Turnpike.  The spill damaged many vehicles and caused a traffic nightmare as crews worked for hours to clean the mess up.  The timing of this incident was also unfortunate because it occurred on the evening before Thanksgiving, traditionally a very high traffic time.

This incident can be analyzed by building a Cause Map, which is an intuitive, visual method for performing a root cause analysis.  The first step when building a Cause Map is to determine how the incident impacted the goals of the organization.  In this example, the safety goal was impacted because there was potential for car accidents and injuries.  Thankfully, no one was actually hurt, but it is important to note the potential impact in order to fully understand the ramifications from an event.  Additionally, the traffic delays are an impact to the schedule goal.  The customer service goal was also impacted because over 150 cars were damaged by the spill.

Now the Cause Map is expanded by asking “why” questions and adding Causes that contributed to the incident in order to show the cause and effect relationships.  In this example, there was a potential for injuries because more than 150 cars were damaged while driving.  The cars were damaged because they drove onto a spill of wet roofing asphalt.  The asphalt covered the cars and their wheels with thick, sticky goo and many of them undrivable.  The cars drove over the roofing asphalt because a tanker truck had leaked onto the road over a long distance.

The tanker truck was carrying a large load of the roofing asphalt, between 4,000 and 5,000 gallons, so there was a large quantity that could potentially be spilled.  Initial findings indicate that the tanker truck spilled the asphalt because of a leaking valve.  Details on why the valve leaked aren’t yet available, but they can be added to the Cause Map as they are known.

Another Cause of this incident is the fact that the driver of the truck was unaware that his truck was leaking so he drove almost 40 miles before he stopped and realized that there was a problem.    It was evening when the leak occurred so the driver wasn’t able to see evidence of a leak easily.

Media reports have stated that the driver of the tanker truck will be charged in the incident.  He is facing charges of failing to secure his load and failing to obey a trooper.  The website of the trucking company has posted a statement encouraging affected vehicle owners to file claims though their insurance.

Click on “Download PDF” above to view a high level Cause Map of this incident.

Plane Crash Kills Hockey Team

By ThinkReliability Staff

Hockey fans were devastated when, on September 7, 2011, a Yak-42 plane carrying a Russian hockey team, including many former NHL players, crashed shortly after takeoff.  A total of 44 people were killed, including 36 passengers and 8 crew members.  One crew member survived the crash.  This incident was the 7th fatal crash to occur in Russia since June, and resulted in the loss of the license of the company who operated the plane.

Now that the Russian air safety organization has released results from its investigation, we can map the details of the crash into a Cause Map, or visual root cause analysis. The Cause Map begins with the impacts to the goals.  The deaths of the crew and passengers are an impact to the safety goal.  The company losing its operating license can be considered an impact to the organizational goal.  The damage to the plane is an impact to the property goal.  All these impacts to the goals were caused by the plane crashing into a riverbank shortly after takeoff.

We ask “Why” questions to add more detail to the map.  It has been determined that the plane crashed because it had insufficient speed during takeoff, and the takeoff was not aborted.  It is also possible that the pilot was attempting to emergency land in the river, and missed.  The plane had insufficient speed during takeoff because the brake was pressed.  Studies determined that a foot had to be placed on the brake pedal in order for the brake to be activated.  Because of the force being used on the control column, it is likely that one of the pilots was attempting to push down using his foot as a brace.  The pilots who were flying the plane were more familiar (and were being trained simultaneously on) another type of plane.  This plane – the Yak-40 – has a foot rest where the Yak-42’s brake pedal is located.  Normally pilots are only trained on one type of plane at a time to minimize this sort of confusion.

In addition, at some point during takeoff, the engine was idled.  This would normally indicate that takeoff is being aborted.  Once the engine was brought back into service, it took some time to regain takeoff power – and the speed had already dropped.  Aviation experts say that takeoff could have been aborted and the crash would have been avoided.  However, it does not appear that an abort attempt was made.  Flight recordings indicate confusion and a lack of effective communication in the cockpit.  Prior to the engine being idled, one of the pilots pushed the control stick forward, after which it was pulled back to resume takeoff.  The crew on this plane had never trained together before which is fairly typical, and may be part of the reason for the recent poor safety record of planes in Russia.  Additionally, the pilot had Phenobarbital in his system, which is known to slow reaction time.  Recommendations to attempt to improve the safety of small planes of regional carriers in Russia have been under consideration with the recent rash of crashes.  However, the loss of many popular hockey players may increase the urging to implement these solutions.

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more.