Power Outage Stretches from Arizona to California

By ThinkReliability Staff

On September 8, 2011, work on a fault capacitor in Arizona began a series of events that resulted in the worst power outage in the Southwest for 15 years.  Although there were no injuries reported as a result of the power outage, there was a high potential for injuries and/or deaths, as hospitals shut down and at least one airport lost runway lighting.  Raw sewage leaked onto beaches and millions found themselves without power.  The economic losses from this incident are reported to be as high as $118 million.  The Federal Energy Regulatory Commission (FERC) will be conducting an investigation to determine how simple capacitor work resulted in an incident with such extreme effects.

The issues related to this power outage are complicated, and can be more clearly understood in a visual format, such as a Cause Map.  We can examine the cause-and-effect relationships that resulted in the impacted goals discussed above.  The potential for injury was caused by a loss of electrical power to hospitals and airports.   The loss of power was caused by a grid crash, resulting from insufficient power and high demand (at least partially due to a heat wave).  Power stations that normally provide electricity were automatically shut down when a current reverse (normally the current runs from Arizona to California) resulted from the loss of a transmission line resulting from the capacitor work.  Although “operator error” has been mentioned as a potential cause, it’s undesirable that one operator’s error could cause such an extreme power outage.  The system should be designed to prevent this, and the investigation will hopefully address issues in the system that contributed to the extent of the outage.

In addition to losing power stations, insufficient base-load capacity in the area (long a source of concern) meant that standby plants could not be brought up fast enough to prevent the crashing of the grid.  Also, renewable wind and solar energy sources weren’t much help due to less than ideal weather conditions for production (cloudy with low wind).

The FERC’s investigation will determine causes that contributed to this power outage and will provide recommendations to limit these types of incidents in the future.  Specifically, they will determine what allowed a simple capacitor issue to result in an extensive power outage and will also consider the grid stability in the area.  However, in the meantime, some individual businesses discovered a boon in having their own generators.  Additionally, U.S. Navy ships in port in San Diego used their generators to supply power to the grid.  While these actions certainly helped lessen the effect of the outage (and brought in a lot of business to locations that did have generators), broader improvements are needed to prevent these types of issues in the future.

To view the Outline and Cause Map, please click “Download PDF” above.

Crash Causes Deaths at Air Race

By ThinkReliability Staff

Sad news is nothing new for the National Championship Air Races – there have been 29 deaths associated with the races in its 47-year history.  However, the ten deaths and dozens of injuries (some extremely serious) resulting from a plane crash and explosion on September 16, 2011 have brought attention to the safety of air racing.

Although full details of the causes of the crash and explosion have not been determined by the National Transportation Safety Board, we can begin a comprehensive root cause analysis with the information available so far by building a Cause Map.  First, we capture the basic details (such as the date and time of the incident) in the Outline.  Then we record the impacts to the goals.  In this case, there was a significant impact to the safety goal, considering the high number of deaths and significant injuries.  The customer service goal can be considered to be impacted because the spectators at the show were not sufficiently protected from injury.  (The FAA grants approval to air shows based on safety of the spectators from a crash.)   The remaining days of the race were cancelled – an impact to the schedule goal.  The plane was destroyed, an impact to the property goal, and the resulting NTSB investigation will cause an impact to the labor goal because of the resources required to complete the investigation.

Once we have captured these impacts to the goals, we can use them to begin the analysis.  The injuries and deaths occurred from the plane crashing into the VIP section and the subsequent explosion which resulted in shrapnel injuries.  The pilot lost control of the plane and did not have sufficient time to recover (as evidenced by there being no indication that he made a distress call).  It’s unclear what exactly caused the loss of control; however, the plane had been modified to increase its speed, which would have impacted its stability in flight.  Additionally, photos taken just before the crash appear to indicate that a portion of the tail fell off, but the reason why has not yet been discovered.  What happened to the tail section, and how the modifications affected control of the plane, are questions the NTSB will examine in their report.

Because of the goal of an air race – traveling around a course at low altitudes and high speeds – it’s no surprise that the pilot did not have sufficient time to recover control before crashing.  Given that these conditions are expected during air races – and appear to be an acceptable risk to pilots, who continue to race even with the high number of crashes and fatalities that result – it appears that there needs to be more consideration of how spectators are protected from crashes and the shrapnel that can result from the destruction of a plane.

When more evidence is gathered, more information can be added to  the Cause Map.  Once that occurs, the NTSB can examine the causes contributing to the deaths at the air race, and make recommendations on how future deaths can be avoided.

To view the Outline and Cause Map, please click “Download PDF” above.

Explosion at Nuclear Waste Site Kills One

By Kim Smiley

An explosion at a nuclear waste processing site in France killed one and injured four workers on September 12, 2011.  The investigation is still ongoing, but it is still possible to create a Cause Map, a visual root cause analysis, that contains all known information on the incident.  As more information becomes available, the Cause Map can easily be expanded to incorporate all relevant details.  One advantage of Cause Mapping is that it can be used to document all information at each step of the investigation process in an intuitive way, in a single location.

When the word “nuclear” is involved emotions and fears can run high, especially following the recent events at the Fukushima nuclear plant in Japan.  This incident is a good example where providing clear information can help calm the situation.  The explosion in France happened when a furnace used to burn nuclear waste failed.  The cause of the explosion itself isn’t known at this time, but there is some relevant background information available that helps explains the potential ramifications of the explosion.

The key to understanding the impact of this incident is the type of nuclear waste that was being burned.  According to statements by the French government, the furnace involved was only used to burn waste with very low level contamination.  It burned things such as gloves and overalls as well as metal waste like tools and pumps.  No objects that were part of a reactor were treated in the furnace.  There are also no reactors at the site that could be potentially damaged by explosion.

There was no radiation leakage detected and the potential for large amounts of released radiation wasn’t there based on the type of material being processed.  It was a horrible accident that resulted in a death and severe injuries, but there was no risk to public health.

How France views nuclear power is also a bit of background worth knowing.  France is the world’s most nuclear power dependent country.  Fifty-eight reactors generate nearly three fourths of France’s power.  France is also a major exporter of nuclear technology.  The public relations issues associated with a nuclear disaster in France would be very complicated.

Once the investigation into this incident is complete, solutions can complete be determined and implemented to help prevent any future occurrences.

Attempted Bombing of Flight 253

By ThinkReliability Staff

Despite constantly increasing airport security, a man suspected of terrorism was able to board a flight from Amsterdam to Detroit with ~80 grams of explosive and a liquid detonator. However, the device did not detonate, likely saving the plane.

Had the explosive detonated, it may have caused the loss of the plane, resulting in the deaths of all on the plane. Even though the loss of the lives and plane did not occur, the potential for it to happen is an impact to the safety goals.

The suspect was able to board the plane because despite warnings from his father, there was insufficient information to add him to the no-fly list (see process map) and his visa was not revoked.

Officials in the U.S. were unaware a visa had been issued by the U.S. embassy in London. Additionally, while the information from the suspect’s father was entered into TIDE (a terrorist intelligence database), there was no follow-up on the information. It’s unclear if there was no follow-up required, or if the follow-up was just not performed.

In an admitted failure of safety procedures, the explosives were not detected by airport security. The information about the suspect was considered not specific enough for the suspect to be put on the “selectee list” which would have led to additional screening. The suspect was not pased through a body scan, which may have detected the explosives, because they are not used on passengers traveling to the U.S. because of the privacy issues. The ingredients were hidden in the suspect’s undergarments and so were not detected by security.

Want to learn more? Read a more detailed root cause analysis of the attempted bombing.

International Space Station Supply Ship Crash

By ThinkReliability Staff

On August 24, 2011, a supply ship heading to the International Space Station (ISS) crashed in Siberia, losing two tons of cargo.  However, the impact of this loss was much more than the two tons of cargo – it may lead to an evacuation of the ISS, which would become unmanned for some unknown period of time.

The crash of the unmanned Progress 44 supply ship, which was on its way to resupply the ISS, was caused by the emergency deactivation of the Soyuz rocket when a gas generator malfunctioned.   Until the specific causes of the malfunction are determined, manned Soyuz flights are grounded.  That means that a new crew cannot get to the Space Station to relieve the current crew.  Although the current crew has enough supplies for the time being, they cannot remain on the space station past December.  The spacecraft already at the station (their “guaranteed ride home”) are only allowed in space for 200 days – due to limited battery life and concern for degradation of rubberized seals from contact with thruster fuel.

Because of a lack of funding, American shuttles are now all mothballed, leaving the Russian Soyuz rockets the  only way to and from the space station.  Finding another way to get there by December is unlikely, leaving the attempt to determine and fix the problems with Soyuz the only hope for continued manning of the ISS.

We can examine this incident in a Cause Map, beginning with the impacts to the goals.  For example, although there were no safety goal impacts resulting from the crash of the unmanned ship, the customer service goal is impacted due to the potential of evacuating the ISS.  The production goal is impacted because of the grounding of manned Soyuz flights, and the property goal is impacted due to the two tons of lost cargo meant for the space station.  We begin our Cause Map with these impacts to the goals, asking “Why” questions to complete the analysis.  The amount of detail in the map is determined by the impact to the goals.  Because the crash may lead to the evacuation and continued unmanned operation of the space shuttle, once specific causes are determined, this Cause Map would become quite detailed.  For now, because the causes have not yet been determined, we begin with a simple map, which does capture the impacts to the goals and the basic information now known.

To view the Outline and Cause Map, please click “Download PDF” above.

Spill Kills Hundreds of Thousands of Marine Animals

By ThinkReliability Staff

A recent fish kill is estimated to have killed hundreds of thousands of marine life – fish, mollusks, and even endangered turtles – and the company responsible is facing lawsuits from nearby residents and businesses affected by the spill causing the kill.  A paper mill experienced problems with its wastewater treatment facility (the problems have not been described in the media), resulting in the untreated waste, known as “black liquor”, being dumped in the river.  The waste has been described as being “biological” not chemical in nature; however, the waste reduced the oxygen levels in the river which resulted in the kill.

Although it’s likely that a spill of any duration would have resulted in some marine life deaths, the large number of deaths in this case are related to the length of time of the spill.  It has been reported that the spill went on for four days before action was taken, or the state was notified.  The company involved says that action, and reporting to the state, are based on test results which take several days.

Obviously, something needs to be changed so that the company involved is able to determine that a spill is occurring before four days have passed.  However, whatever actions will be taken are as of yet unclear.  The plant will not be allowed to reopen until it meets certain conditions meant to protect the river.  Presumably one of those conditions will be figuring out a method to more quickly discover, mitigate, and report problems with the wastewater treatment facility.

In the meantime, the state has increased discharge from a nearby reservoir, which is raising the water levels in the river and improving the oxygen levels.  The company is assisting in the cleanup, which has involved removing lots of stinky dead fish from the river.  The cleanup will continue, and the river will be stocked with fish, to attempt to return the area to its conditions prior to the spill.

This incident can be recorded in a Cause Map, or a visual root cause analysis.  Basic information about the incident, as well as the impact to the organization’s goals, are captured in a Problem Outline.  The impacts to the goals (such as the environment goal was impacted due to the large numbers of marine life killed) are used to begin the Cause Map.  Then, by asking “Why” questions, causes can be added to the right.  As with any incident, the level of detail is dependent on the impact to the goals.

To view the Outline and Cause Map, click “Download PDF” above.