by Kim Smiley
About 1:40 am on May 17, six rail cars derailed and overturned near Lafayette, Louisiana. One of the cars was damaged and leaked about 11,000 gallons of hydrochloric acid. Five people, including two rail workers, were sent to a hospital and treated for eye and skin irritation.
Authorities evacuated people with 1 mile of the accident. Approximately 3,000 people were affected, including a few small businesses and a nursing home. All affected people are being reimbursed for food and hotel costs by the railway company that operated the train.
There was potential for further release of chemicals because one of the rail cars involved in the accident carried ethylene oxide, a flammable and dangerous chemical, and two of the remaining cars also carried hydrochloric acid.
The Louisiana State Police’s hazardous materials unit is overseeing clean-up of the accident site. The spill is being neutralized with lime and the contaminated material will be removed and disposed of. The rail car containing ethylene oxide was removed from the site quickly to reduce the potential for additional problems.
The cause of the derailment is not known at this time. The Federal Railroad Administration will conduct an investigation of the accident.
The attached PDF file contains an intermediate level root cause analysis of the train derailment built using Cause Mapping, a visual form of root cause analysis. It was built using the facts that were available in media reports on the accident. As more details are known, the Cause Map can be expanded.
By Kim Smiley
Early in March 2008, NASA announced that the shuttle mission to the Hubble telescope would take place in the fall rather than in August as originally scheduled. A trip to Hubble is necessary to replace gyroscopes and batteries that failing. Additionally, the mission will also be sued to install instruments that will increase the range of the telescope. The changing schedule itself is not a cause for alarm, but the reasons between the slip are interesting. The changing schedule shows that NASA is still struggling to recover from the tragic loss of the Columbia in many ways.
The shuttle mission is delayed because new design fuel tanks will not be manufactured in time to support the original schedule. In 2003, Columbia and her crew were lost when external foam fell off the fuel tank during ascent and struck the wing of the orbiter creating a plate size hole. Initially, NASA managed the foam issue by modifying existing fuel tanks. The last of these pre-existing fuel tanks will fly with Discovery when the shuttle launches for a space station assembly mission May 31. The fuel tanks for future launches are being built with design modification to prevent foam loss. This manufacturing process is taking four to five weeks longer than originally planned. No information is available in media reports explaining why the manufacturing schedule is longer than expected.
The mission to the Hubble telescope is also the only shuttle mission planned that will not go to the international space station. This fact is relevant because it means that two shuttles have to be prepared for launch, not just one. Two shuttles means double the work needed to get the new fuel tanks ready for launch. A second shuttle will be prepared in the event a rescue mission is needed. Trips to the space station are less risky because the astronauts could seek shelter in the space station if the orbiter was damaged, providing a much longer window for potential rescue.
The attached PDF file contains an intermediate level root cause analysis of the delay of the Hubble shuttle mission. It was built using the facts that were available in media reports. As more details are known, the Cause Map can be expanded.
By Kim Smiley
An Associated Press article, published on April 25, highlighted a common, often ignored problem of customers getting a different amount of gas then what they paid for. Gas pumps contain a check valve that allows gas to start flowing at the same time the price meter starts. As the check valves age, they can begin to hesitate and wait a period of time before gas flow begins. This results in the consumer being overcharged because the price meter is turning before gas is flowing. Worn check valves usually only cost consumers pennies per fill-up, but there have been instances of overcharges of 30 to 40 cents a gallon. This issue doesn’t cost the consumer large amounts of money, but it adds frustration to a public already aggravated by record high gas prices.
To be fair, it should be mentioned that worn check valves sometimes help the consumer as well. When a check valve hesitates at the end of a fill up, the price meter is stopped and a small amount of gas will continue to flow. Also, to clarify, this isn’t a case of gas stations purposely gorging consumers. It’s a situation where a common piece of machinery is wearing out and not functionally properly.
To help prevent these types of errors, gas pumps are regularly inspected to ensure that consumers are charged for the correct amount of gas. Regulations allow gas pumps to pass inspection if they overcharge by no more than 6 cents for every five gallons delivered. Most states require gas pumps to be inspected every year to ensure accurate measurement of gas delivered. Many counties try to inspect more frequently, but have difficultly because of staffing shortages and financial pressure.
The attached PDF file contains an intermediate level root cause analysis of the worn check valves in gas pumps. It was built using the facts that were available in media reports. As more details are known, the Cause Map can be expanded.
By Kim Smiley
The Mars Climate Orbiter (MCO) was launched atop a Delta II launch vehicle on December 11, 1998. Nine and a half months after launch, the MCO was scheduled to begin the process of establishing an orbit around Mars. The plan was to use a technique called aerobraking to reduce the MCO velocity and slowly move the MCO from a 14 hour orbit to a 2 hour orbit. On September 23, the $125 million dollar MCO was lost during the attempt to establish orbit around Mars. Investigation into the accident revealed that the orbiter had entered the Martian atmosphere traveling too quickly with too low a trajectory. The heat produced by friction from hitting the thicker atmosphere present at the lower trajectory at high velocity destroyed the orbiter. The loss of the MCO cost NASA more than the $125 million dollars spent building the MCO. In addition, NASA lost a substantial amount of time, lost all potentially gathered data, and lost some of the public support for the NASA program.
NASA investigation revealed many causes of the loss of the orbiter. One of the most obvious causes is a unit error in the software used to help predict the velocity of the MCO, which in turn is used to predict the trajectory the MCO would enter Martian atmosphere. A little background is needed to understand how an error in the software causes errors in the predicted velocity. Software called “Small Forces” is used to predict how the MCO’s velocity changed after a angular momentum desturation maneuver. A angular momentum desturation maneuver is performed when one of the momentum wheels used to help the orbiter maintain orientation in space starts spinning too quickly. During an angular momentum desturation maneuver, a wheel is deliberately slowed down (which would normally turn the spacecraft) while at the same time a jet is fired to counteract this force and keep the orientation relatively constant. This whole process affects the speed the spacecraft is traveling and affects the trajectory of entry in the Mars atmosphere. The error in Small Forces was simple one. The results were in pound force and the program that predicted velocity expected them to be in Newtons.
The attached PDF file contains an intermediate level root cause analysis of the loss of the MCO. It was built using facts from media reports and the NASA investigation reports. The map can be expanded using all the known data to create a detailed Cause Map.
Learn more about the Mars Climate Orbiter.
By Kim Smiley
American Airlines resumed a normal flight schedule Saturday afternoon, ending a period of widespread flight cancellations. Between April 8 and 12, 3,300 flights were canceled when all MD-80 jetliners in the American Airlines fleet were grounded. More than a quarter of a million passengers were affected by the widespread flight cancellations. As discussed in a previous blog, these drastic measures were taken when a large percentage of inspected MD-80s failed to meet FAA regulations on wiring from the airframe to a pump in the wheel well. The wiring can be a fire hazard and affect power distribution. An intermediate level Cause Map showing the causes of the cancellations can be seen in the previous blog posted on April 10.
The cancellations may be over, but the effects will continue to linger. The cost to the American Airline is estimated to be in the tens of millions of dollars. In addition to lost revenue, American Airlines gave many inconvenienced passengers $500 travel vouchers and paid to put stranded travelers in hotels. It is also difficult to put a financial cost on the huge amount of negative publicity that the airline has received as a result of these cancellations, but it is guaranteed to affect their business. In addition to the financial burden of these cancellations, the entire airline industry is faced with raising fuel costs and this is going to put even more pressure on American Airlines. Already, American Airlines announced on Friday (ironically on a day when nearly 600 flights were canceled) that it will be raising prices by as much as $30 a round trip tickets to help compensate for high fuel costs. These dual blows to the bottom line are going to affect the health of the American Airline company for the foreseeable future.
It is also likely that many other airlines will be similarlly affected. Doing a root cause analysis, it is clear that one of the causes of these cancellations is a new focus by the FAA on “zero tolerance” for any deviations from their detailed regulations. As airlines struggle to understand the new inspection criteria, it is likely that other airlines will face cancellations. The airline industry as a whole is facing some high hurdles in the upcoming months. Four discount carriers have already declared bankruptcy in the last month and it is likely others will follow suit. Even the established, traditional carriers are seeking changes to stay competitive. For example, rumors are circulating about a possible Northwest and Delta merger. This is going to be a turbulent time for Airlines and passengers.
By Kim Smiley
Starting April 8, 2008, American Airlines grounded nearly half of its fleet when it pulled all 300 McDonell Douglas jets (MD-80s) from service. At least 2,400 flights were canceled. It is estimated that 100 passengers would have been on each of the canceled flights, bringing the total of affected passengers to nearly a quarter of a million people. The MD-80s were grounded because 15 of 19 inspected aircraft failed FAA inspection this week. The issue is with the installation of wiring connecting the airframe to a hydraulic pump in the wheel well. The regulations are written to prevent rubbing and chafing of the wiring, which can lead to exposed wiring. Exposed wiring is a concern because it can to power issues and shorts, and it is a potential fire hazard.
The most alarming part of the story is that American Airlines grounded these same planes for the exact same issue on March 26 and 27. Over 350 flights were canceled while the planes were inspected and repaired if necessary to compile with the FAA wiring regulations. All planes were back in service on March 28 after American Airlines asserted they satisfied the regulation. Little information is available on what went wrong two weeks ago. There are a number of questions that would need to be answered to perform a thorough investigation. Are the FAA regulations confusing? Do the AA mechanics need additional training? Did the airline fail to internally check the wiring prior to putting the planes back into service? If an inspection did occur, did the inspectors understand what they were looking for? It may not be clear exactly what went wrong, but it is clear that something failed in the system to cause this second round of cancellations.
The attached PDF file contains an intermediate level root cause analysis of the cancellation of American Airline flights on April 8-9. It was built using the facts that were available in media report. There are many details still missing, that could be added as more details are known.
By Kim Smiley
Just before 11 am on January 25, 2008, a fire started on the roof of the 32 story Monte Carlo Hotel in Las Vegas. The fire spread quickly along the outside of the building, fueled by the highly flammable foam like material, Exterior Insulation Finishing System (EIFS), used to construct the hotel façade. A spark from a hand held cutting torch being used on the roof of the hotel hit the EIFS and started the fire. 6,000 guests and workers were evacuated from the hotel. The hotel remained closed until February 15. Considering both the damage to the hotel and lost business, the total cost of the fire is approximately $100 million dollars. Luckily, no major injuries resulted from the fire.
A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page. The Cause Map shows that the fire started because a spark from a hand held torch hit a flammable material. The Cause Map can also be used to identify possible solutions that would prevent another fire. In this case, two areas that would merit farther investigation would be the use of highly flammable material on buildings and the lack of protective measures taken to protect the EIFS from the sparks. For example, there were no mats in place to protect the EIFS from being hit by sparks. From the information available, it isn’t clear why no protective measures were taken to protect the EIFS, but it is known that the contractor failed to obtain the correct permit (which involves getting information on appropriate safety procedures). It is reported in an Associated Press article on the fire that Las Vegas city officials are currently evaluating whether restrictions should be placed on the use of EIFS.
The attached PDF file contains an intermediate level root cause analysis of the hotel fire. It was built using the facts that were available in media reports on the fire. As more details are known, the Cause Map can be expanded.
By Kim Smiley
Root cause analysis can be a very effective technique to analyze a problem. But what if the evidence trail goes cold? Is creating a Cause Map still useful when unanswered questions remain after a thorough investigation? The crash of a Comair jet in Lexington Kentucky on August 27, 2006 is a good example of this situation. The plane crashed during takeoff, killing 49 people . The flight crew mistakenly attempted to takeoff on the wrong runway, which was too short for the plane to reach the necessary speed for lift off. Even after a detailed investigation by the National Transportation and Safety Board, it still is not clear why the flight crew used the wrong runway. As an aside, the pilot and the first officer were competent professionals from all accounts and there is no history of either making errors of this magnitude.
Plane crashes are unique in the fact that there is a lot of data available to investigators. The cockpit voice recorder (CVR) records all conversations in the cockpit and the flight data recorder (FDR) records instrument readings. Usually the reason behind plane crashes can be determined using all this data. In this case, the information did provide some useful insight, but no clear reasons why the mistake occurred.
Buillding a Cause Map of this accident does make one thing very clear. There are many events that had to occur for this mistake to happen. One of the causes of the plane crash is clearly the error on the part of the flight crew, but another cause is the failure of the traffic controller to catch and correct the error. There were two separate windows of time where the controller had an opportunity to prevent the plane crash, but didn’t for a variety of reasons.
It’s tempting to say the plane crashed because the crew used the wrong runway and leave it at that. The main problem with this line of reasoning is that this conclusion doesn’t help prevent future crashes, especially since the error isn’t well understood. If all the focus is placed on why the wrong runway was used, an opportunity to improve the process and prevent future accidents is lost. In a case where there is missing information, building a cause map can be useful because it helps the investigation to explore all the causes and potential solutions. Only one cause needs to be eliminated to prevent the accident. For instances, the crew could had lined up at the runway and the accident could have still been prevented if the controller had caught the mistake. Focusing on a solution to eliminate the better understood causes provides a useful place to start.
Learn more about the Lexington Plane Crash.
By Kim Smiley
Just after 4 a.m. on January 5th, 2008 about 600 homes began flooding in Fernley, Nevada, about 25 miles East of Reno. A 50 foot section of a canal embankment failed flooding the adjacent area. The 32-mile canal carried water from the Truckee River south to Fallon area farms. There were no injuries in the flooding but it easily could have been very serious. The complete estimates for repairing the canal and the homes are not available at this time.
A report issued by the U.S. Bureau of Reclamation released March 20th concluded that the century-old irrigation canal failed due to burrowing rodents. A simple root cause analysis for this incident using the Cause Mapping method captures the tunneled holes in the embankment as one of the causes. Another one of the causes is the increased water flow in the canal caused by the nearly 2 inches of rain that fell the day before. The annual rainfall for the area is about 5 inches.
The Cause Map shows that the canal obviously failed because the stress on the embankment was greater than the strength of the embankment. The increased water flow added to the stress on the embankment while the holes tunneled by the rodents reduced the strength. A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.
Since the canal is almost 100 years old tunneling muskrats are not a surprise. If the holes would have been identified earlier and filled, the risk of the breach would have been reduced significantly. The evidence that the inspection and maintenance of the canals was ineffective is the fact that the canal failed due to holes. An effective inspection program would have found the holes and addressed them – that’s the purpose of inspection and maintenance. Past inspections may have been conducted exactly as required, which simply means the previous inspection requirements were inadequate. Ineffective inspections is one of the causes of the canal failure that would need to be investitgated further.
The attached PDF file contains an intermediate level root cause analysis of the canal failure. It includes causes that were considered in the Bureau of Reclamation report as well as some of the evidence and solutions. A more detailed Cause Map can be created from the specific information in the bureau’s report.
By Kim Smiley
On Sunday March 23, a Ukrainian tug boat collided with a Chinese registered cargo ship. The tug boat capsized and sunk in 115 feet of water, trapping 18 sailors inside the hull. All 25 passengers on the cargo ship and seven passengers on the tugboat were rescued. Experts believe the trapped sailors could still be alive if they were able to find air pockets inside the boat. Unfortunately, no signal or sound coming from within the capsized ship has been detected during the 9 rescue attempts that have occurred so far. Rescue efforts continue, but are hindered by low visibility and strong currents.
There is very little information currently available on how the collision happened. Even through the details are vague, it can be very useful to apply the root cause analysis method during this stage of an investigation. Knowing some of the basic causes that have to be present for each type of incident can help direct the investigation efforts. For example, if a fire occurs you already know that there was a spark, oxygen and fuel present and you can start the investigation by considering each of these causes.
In the case of the tugboat collision, there are number of causes that had to be present for the collision to occur and they could be used as starting places for the investigation. Beyond the really basic, like there had to be two ships present, there are a few facts that can be assumed from the beginning. First, there are strict rules of the road that govern the path of ships, especially near land, similar to the laws that govern vehicle traffic. Somebody didn’t follow the rules and if you can figure out who didn’t and why that will go a long way to explaining why the ships collided. Second, every ship should have situational awareness and avoid other ships (even if that other ship is doing something strange) and both ships failed to keep their distance from the other ship. Either this was a failure to properly monitor position or the methods used were inadequate. In this specific case, from the damaged that both ships sustained, I’d also be willing to bet that somebody was going to fast too close to shore.
Each type of accident has fundamental causes that had to be present for it occur. While many investigations lead far beyond the causes that can initially be assumed, they can be helpful place to start. Performing a root cause analysis can help guide an investigation and ensure all the pertinent questions are asked and answered.