Hubble Focusing Issues [ August 4th, 2008 ] Posted in » Root Cause Analysis - Incident Investigation

Hubble TelescopeThe Hubble Space Telescope was launched on April 24, 1990.  Once in orbit, it was quickly discovered that the images from Hubble were blurred.  An investigation into the issue revealed that Hubble’s primary mirror was not built to specification and couldn’t properly focus the light.  Specifically, the mirror was flattened too much away from the center and caused the light reflected from the edge of the mirror to focus on a slightly different location than the light reflected from the center.   The primary mirror in Hubble was only off specification by 2.3 micrometers, but the result to the $1.5 billion dollar project was disastrous. 

Solving Hubble’s focus issues was no small feat.  How do you repair a mirror that can’t be replaced on orbit when it is cost prohibitive to bring it back to earth for repair?  The answer was to modify the lens (which met specifications) to work with the off specification mirror.  COSTAR (Corrective Optics Space Telescope Axial Replacement) was added to Hubble during the first servicing mission in December 1993.  COSTAR is essentially eyeglasses for Hubble, additional lens built with the same error as the mirror, but in the opposite direction so that the effects of the off specification mirror shape are canceled out.  With the addition of COSTAR, Hubble met original design goals.

The primary mirror was constructed with a flaw because the tool, called a null corrector, used to create the template to guide the shaping of the mirror was itself flawed.  Null correctors use precisely located mirrors and lens to determine the shape of a mirror.  In order to assemble null correctors, reflected light is used to measure the distance between the mirror and the lens inside the tool.  When the null corrector used to shape the Hubble’s primary mirror was assembled a measurement error was made.  A small amount of reflective coating had fallen off an internal piece of the instrument and the laser used to perform the measurement reflected off the wrong location, resulting in a lens being 1.3 mm to far from the mirror.  Null correctors are extremely precise and do not change once assembled so the Hubble team used a single instrument to guide the mirror shape.  A single flawed tool and inadequate quality controls resulted in a flawed mirror.

Root Cause Analysis :: Hubble Focus Issue A visual representation of root cause analysis has been created as a Cause Map that can be downloaded.

Hindenburg Crash - Competing Theories

On May 6th, 1937, the Hindenburg burst into flames over the Lakehurst, NJ Naval Base, after completing a successful trip across the Atlantic.  35 of the 97 passengers (and one of the ground crew) were killed.  The Hindenburg itself was a total loss, and the popularity of airships never recovered after the accident. 

The loss of 36 lives and the loss of the Hindenburg were both caused by the fire aboard. The loss of popularity of airships was caused by both the loss of the Hindenburg, and by the loss of lives.  The next question to ask is “Why did the fire occur?”

For the Hindenburg, this is where things start to get interesting.  There are three separate theories about why the fire started.  There are people who believe very strongly in each.   Luckily for us, the beauty of the Cause Map form of a root cause analysis is that we can use it even if we haven’t determined which theory is correct.

The first theory is that the fire started from sabotage.  Because the Hindenburg was frequently used as a Nazi propaganda tool, some thought it was almost too easy of a target for sabotage from anti-Nazi activists (who included in their number the designer of Hindenburg, Dr. Hugo Eckener.)  There was even a “suspicious” character who survived the crash, a German acrobat living in America.  However, eventually the FBI dismissed the idea of sabotage as a “red herring.” 

Another theory is that the fire began when static electricity ignited the flammable cover of the airship.  The major proponent of this theory, Dr. Addison Bain, has run tests on pieces of the Hindenburg cover preserved from the wreck site.  (This was not until 1994.)  He has also found supporting evidence from historic records of the Zeppelin company.

The other theory is that static electricity ignited a flammable hydrogen-oxygen mixture.  This was the original cause attributed to the disaster by the U.S. Department of Commerce’s root cause analysis investigation after the crash.  There are also people who claim that Dr. Bain’s theory is physically impossible, and do not specifically champion a cause, but treat this one as the most likely.

Note that we’re not espousing a theory - we are just recording all of the possibilities.  Once we have done that, the cause map allows us to find solutions for any potential causes.  Once we have all the theories mapped out, we can use the cause map as a resource to determine the solutions that are most helpful, or continue our root cause analysis investigation to determine which causes are most likely.

The attached pdf document gives an intermediate level Cause Map of the incident.Root Cause Analysis Hindenburg

April 21st, 2008 | Leave a Comment

Loss of Mars Climate Orbitor

MCOThe Mars Climate Orbiter (MCO) was launched atop a Delta II launch vehicle on December 11, 1998.  Nine and a half months after launch, the MCO was scheduled to begin the process of establishing an orbit around Mars.  The plan was to use a technique called aerobraking to reduce the MCO velocity and slowly move the MCO from a 14 hour orbit to a 2 hour orbit.  On September 23, the $125 million dollar MCO was lost during the attempt to establish orbit around Mars.  Investigation into the accident revealed that the orbiter had entered the Martian atmosphere traveling too quickly with too low a trajectory.  The heat produced by friction from hitting the thicker atmosphere present at the lower trajectory at high velocity destroyed the orbiter.  The loss of the MCO cost NASA more than the $125 million dollars spent building the MCO.  In addition, NASA lost a substantial amount of time, lost all potentially gathered data, and lost some of the public support for the NASA program.

NASA investigation revealed many causes of the loss of the orbiter.  One of the most obvious causes is a unit error in the software used to help predict the velocity of the MCO, which in turn is used to predict the trajectory the MCO would enter Martian atmosphere. A little background is needed to understand how an error in the software causes errors in the predicted velocity.   Software called “Small Forces” is used to predict how the MCO’s velocity changed after a angular momentum desturation maneuver.  A angular momentum desturation maneuver is performed when one of the momentum wheels used to help the orbiter maintain orientation in space starts spinning too quickly.  During an angular momentum desturation maneuver, a wheel is deliberately slowed down (which would normally turn the spacecraft) while at the same time a jet is fired to counteract this force and keep the orientation relatively constant.  This whole process affects the speed the spacecraft is traveling and affects the trajectory of entry in the Mars atmosphere.  The error in Small Forces was simple one.  The results were in pound force and the program that predicted velocity expected them to be in Newtons.

Root Cause Analysis MCOThe attached PDF file contains an intermediate level root cause analysis of the loss of the MCO.  It was built using  facts from media reports and the NASA investigation reports. The map can be expanded using all the known data to create a detailed Cause Map.

April 18th, 2008 | Leave a Comment

UPDATE: FDA releases revised death count from heparin contamination

The Food & Drug Adminstration (FDA) recently reviewed adverse events related to heparin, which has been the subject of much scrutiny after 19 deaths were reported due to allergic reactions from contaminated vials.  Since January 2007, the FDA has received reports of 103 deaths from people taking heparin, 62 of which involved allergic reactions or hypotension (dangerously low blood pressure).  These deaths include people who were taking all brands of heparin, not just the brand affected by the contamination and recall.  The manufacturers of the brand that was contaminated and recalled says that they know of only 4 deaths assocciated with their contaminated product.  The FDA has stated that this does not mean that the deaths were necessarily caused by the allergic reactions and low blood pressure.  Although allergic reactions and low blood pressure were the cause of death of those who have died from the contaminated vials, it’s not clear that all 62 deaths are associated with contaminated heparin.    In fact, heparin carries a warning detailing risk of low blood pressure.  However, in the year 2006, only 55 deaths were reported from heparin, and only 3 were due to allergic reactions.  So there is obviously something that is increasing the number of allergic reactions to heparin.  Hopefully the increase in deaths is due to the contaminated heparin that has already been recalled from the market, but it’s possible that there are other issues, or other brands that are also contaminated.  The FDA continues to investigate, and hopefully can provide answers soon, especially to the people who depend on heparin for their well-being. 

The previous blog entry shows an intermediate level Cause Map (root cause analysis) as a downloadable pdf.

April 16th, 2008 | Leave a Comment

UPDATE: Grounded Flights for American Airlines

Root Cause Analysis American AirlinesRoot Cause Analysis American AirlinesAmerican Airlines resumed a normal flight schedule Saturday afternoon, ending a period of widespread flight cancellations.  Between April 8 and 12, 3,300 flights were canceled when all MD-80 jetliners in the American Airlines fleet were grounded.    More than a quarter of a million passengers were affected by the widespread flight cancellations.  As discussed in a previous blog, these drastic measures were taken when a large percentage of inspected MD-80s failed to meet FAA regulations on wiring from the airframe to a pump in the wheel well.  The wiring can be a fire hazard and affect power distribution. An intermediate level Cause Map showing the causes of the cancellations can be seen in the previous blog posted on April 10.

The cancellations may be over, but the effects will continue to linger.  The cost to the American Airline is estimated to be in the tens of millions of dollars.  In addition to lost revenue, American Airlines gave many inconvenienced passengers $500 travel vouchers and paid to put stranded travelers in hotels.  It is also difficult to put a financial cost on the huge amount of negative publicity that the airline has received as a result of these cancellations, but it is guaranteed to affect their business.  In addition to the financial burden of these cancellations, the entire airline industry is faced with raising fuel costs and this is going to put even more pressure on American Airlines.  Already, American Airlines announced on Friday (ironically on a day when nearly 600 flights were canceled) that it will be raising prices by as much as $30 a round trip tickets to help compensate for high fuel costs.  These dual blows to the bottom line are going to affect the health of the American Airline company for the foreseeable future.

It is also likely that many other airlines will be similarlly affected.  Doing a root cause analysis, it is clear that one of the causes of these cancellations is a new focus by the FAA on “zero tolerance” for any deviations from their detailed regulations.  As airlines struggle to understand the new inspection criteria, it is likely that other airlines will face cancellations.  The airline industry as a whole is facing some high hurdles in the upcoming months.  Four discount carriers have already declared bankruptcy in the last month and it is likely others will follow suit.  Even the established, traditional carriers are seeking changes to stay competitive.  For example, rumors are circulating about a possible Northwest and Delta merger.  This is going to be a turbulent time for Airlines and passengers.

April 14th, 2008 | Leave a Comment

Grounded Flights for American Airlines

American Airlines Starting April 8, 2008, American Airlines grounded nearly half of its fleet when it pulled all 300 McDonell Douglas jets (MD-80s) from service.  At least 2,400 flights were canceled.  It is estimated that 100 passengers would have been on each of the canceled flights, bringing the total of affected passengers to nearly a quarter of a million people.  The MD-80s were grounded because 15 of 19 inspected aircraft failed FAA inspection this week.  The issue is with the installation of wiring connecting the airframe to a hydraulic pump in the wheel well.  The regulations are written to prevent rubbing and chafing of the wiring, which can lead to exposed wiring.  Exposed wiring is a concern because it can to power issues and shorts, and it is a potential fire hazard.

The most alarming part of the story is that American Airlines grounded these same planes for the exact same issue on March 26 and 27.  Over 350 flights were canceled while the planes were inspected and repaired if necessary to compile with the FAA wiring regulations.  All planes were back in service on March 28 after American Airlines asserted they satisfied the regulation.  Little information is available on what went wrong two weeks ago.   There are a number of questions that would need to be answered to perform a thorough investigation.  Are the FAA regulations confusing?  Do the AA mechanics need additional training?  Did the airline fail to internally check the wiring prior to putting the planes back into service?   If an inspection did occur, did the inspectors understand what they were looking for?   It may not be clear exactly what went wrong, but it is clear that something failed in the system to cause this second round of cancellations.

Root Cause Analysis American AirlinesThe attached PDF file contains an intermediate level root cause analysis of the cancellation of American Airline flights on April 8-9.  It was built using the facts that were available in media report.  There are many details still missing, that could be added as more details are known.

April 10th, 2008 | Leave a Comment

Root Cause Analysis: Monte Carlo Hotel Fire - Las Vegas, NV

Monte Carlo Hotel FireJust before 11 am on January 25, 2008, a fire started on the roof of the 32 story Monte Carlo Hotel in Las Vegas.  The fire spread quickly along the outside of the building, fueled by the highly flammable foam like material, Exterior Insulation Finishing System (EIFS), used to construct the hotel façade.  A spark from a hand held cutting torch being used on the roof of the hotel hit the EIFS and started the fire.  6,000 guests and workers were evacuated from the hotel.  The hotel remained closed until February 15.   Considering both the damage to the hotel and lost business, the total cost of the fire is approximately $100 million dollars.  Luckily, no major injuries resulted from the fire.

A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.  The Cause Map shows that the fire started because a spark from a hand held torch hit a flammable material.  The Cause Map can also be used to identify possible solutions that would prevent another fire.  In this case, two areas that would merit farther investigation would be the use of highly flammable material on buildings and the lack of protective measures taken to protect the EIFS from the sparks.  For example, there were no mats in place to protect the EIFS from being hit by sparks.  From the information available, it isn’t clear why no protective measures were taken to protect the EIFS, but it is known that the contractor failed to obtain the correct permit (which involves getting information on appropriate safety procedures). It is reported in an Associated Press article on the fire that Las Vegas city officials are currently evaluating whether restrictions should be placed on the use of EIFS.

Root Cause Analysis Monte CarloThe attached PDF file contains an intermediate level root cause analysis of the hotel fire.  It was built using the facts that were available in media reports on the fire.  As more details are known, the Cause Map can be expanded.

April 8th, 2008 | Leave a Comment

Heparin Contamination - 19 Lives Lost

Heparin, which is widely used as an anticoagulant (blood thinner) has been in the news lately and the news is scary.  19 people have died, and 785 have experienced adverse reactions due to contaminated heparin.  The heparin in question has been found to contain up to 50% oversulfated chondroitin sulfate, which mimics heparin so closely it can not be distinguished in basic tests but provides no anticoagulant activity.  The adverse effects are caused by severe allergic reactions, including low blood pressure which can occasionally lead to fatal stroke.

Whether or not the chondroitin sulfate is to blame for the allergic reactions, it also has the potential to cause serious harm by negatively affecting the blood thinning properties of Heparin.  People who take heparin because they require its anticoagulant properties may have serious difficulties with a dose that is only 50% effective.  Because of these concerns, the Heparin in question is taken off the market.  But serious consumer concerns remain about the system that allowed the contamination to happen in the first place.  Due to the potential for fatal side effects, lots of heparin (the total amount is unclear) have been recalled from 6 countries (at last count).

A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.  Click on the pdf document below for a more detailed analysis.

Root Cause Analysis Beef Recal

April 6th, 2008 | Leave a Comment

What do we miss by focusing on “THE ROOT CAUSE”?

Many organizations focus on trying to boil down their problems, even extremely complex ones, into the one “root cause”.  One of the problems with this is the overgeneralization that results.  This overgeneralization may allow organizations to feel that they are “off the hook” if, for example, the “root cause” ends up being “human error.”  Because human error is unavoidable, there may be no steps taken to prevent or mitigate further occurences.  Overgeneralization can also lead to a warped perspective of the problem in question, based on the desire to find the one true “root cause” of the event.  If you ask people what the cause of the EXXON VALDEZ oil spill was, many people will say that the Captain was drunk (overgeneralization of a complex issue into human error, specifically pointed at the man in charge).  However, not only was the Captain not present on the bridge at the time of the grounding which resulted in the oil spill, he was found not guilty of operating a vessel under the influence of alcohol. 

Another issue with attempting to find the “root cause” is all of the other contributing causes that will be missed.  This is especially important when the solution for the “root cause” is not 100% effective.  TWA Flight 800 went down for many reasons, but according to the National Transportation Safety Board, the airlines sole focus on preventing fuel tank explosions is preventing ignition energy from entering the tank.  However, that solution is not foolproof - ignition sources can be minimized but not entirely removed.  That is why some are turning their focus towards solutions for the other causes - namely the flammability of the fuel tanks and the presence of oxygen that would allow an explosion to occur.

So if finding the “root cause” isn’t the answer, what is?  Well, in order to effectively combat a problem, we have to find the best solution.  In order to find the best solution, we have to find all the solutions, and in order to find all the solutions, we have to find all the causes.  We do this by making a Cause Map, a visual root cause analysis.  This Cause Map asks “why” until all possible and contributing causes have been identified.  The next step is to identify any potential solutions for each cause.  Once all potential solutions have been found, an organization needs to determine which solution, or solutions, is best based on the severity of the issue, the effectiveness of the solution(s) and the availability of resources to implement the solution(s). 

April 3rd, 2008 | Leave a Comment

Site Map   Root Cause Analysis