Hubble Focusing Issues [ August 4th, 2008 ] Posted in » Root Cause Analysis - Incident Investigation

Hubble TelescopeThe Hubble Space Telescope was launched on April 24, 1990.  Once in orbit, it was quickly discovered that the images from Hubble were blurred.  An investigation into the issue revealed that Hubble’s primary mirror was not built to specification and couldn’t properly focus the light.  Specifically, the mirror was flattened too much away from the center and caused the light reflected from the edge of the mirror to focus on a slightly different location than the light reflected from the center.   The primary mirror in Hubble was only off specification by 2.3 micrometers, but the result to the $1.5 billion dollar project was disastrous. 

Solving Hubble’s focus issues was no small feat.  How do you repair a mirror that can’t be replaced on orbit when it is cost prohibitive to bring it back to earth for repair?  The answer was to modify the lens (which met specifications) to work with the off specification mirror.  COSTAR (Corrective Optics Space Telescope Axial Replacement) was added to Hubble during the first servicing mission in December 1993.  COSTAR is essentially eyeglasses for Hubble, additional lens built with the same error as the mirror, but in the opposite direction so that the effects of the off specification mirror shape are canceled out.  With the addition of COSTAR, Hubble met original design goals.

The primary mirror was constructed with a flaw because the tool, called a null corrector, used to create the template to guide the shaping of the mirror was itself flawed.  Null correctors use precisely located mirrors and lens to determine the shape of a mirror.  In order to assemble null correctors, reflected light is used to measure the distance between the mirror and the lens inside the tool.  When the null corrector used to shape the Hubble’s primary mirror was assembled a measurement error was made.  A small amount of reflective coating had fallen off an internal piece of the instrument and the laser used to perform the measurement reflected off the wrong location, resulting in a lens being 1.3 mm to far from the mirror.  Null correctors are extremely precise and do not change once assembled so the Hubble team used a single instrument to guide the mirror shape.  A single flawed tool and inadequate quality controls resulted in a flawed mirror.

Root Cause Analysis :: Hubble Focus Issue A visual representation of root cause analysis has been created as a Cause Map that can be downloaded.

Prudhoe Bay Pipeline Corrosion

In 2006, a British Petroleum (BP) worker in Prudhoe Bay, Alaska discovered a leak in its transit, or feeder pipelines (which deliver the crude oil drilled by BP to the main Trans-Alaskan Pipeline, which transports the oil to Valdez, in the southern part of Alaska.  The oil is taken from there in ships to the lower 48.)  Approximately 5,000 barrels (more than 200,000 gallons) of oil were spilled, adversely affecting almost 2 acres of permafrost (continually frozen soil).  During inspections performed as a result of the spill, severe corrosion (and another, smaller spill) was discovered in 16 miles of pipeline.  BP decided to replace all 16 miles of affected pipeline, at a cost of $260 million.  Additionally, BP paid $20 million in fines, restitution to the State of Alaska, and a payment for environmental research.  Some people believe this is the largest fine ever paid in the state for what was legally considered an “environmental misdemeanor.”

Root Cause Analysis PrudhoebayA thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page (see attached pdf).  First we will look at the impact to the goals.  For the BP pipeline the environmental goal was impacted because 5,000 barrels of crude oil spilled, which affect on 1.9 acres of permafrost.  The impact on the two environmental goals was caused by the leak of crude oil.   The customer service goal was impact due to an increase in oil prices to consumers.  This occurred because  of the loss of barrels during the production shutdown, which is an impact to our production goal.  The loss of oil is due to the shutdown, which occurred in order to replace the affected lines, which is also an impact to the material goals.  The lines had to be replaced because a loss of pipe integrity was discovered, which led to the fine and restitution, which is also an impact to the material goal.

The loss of pipe integrity was due to severe corrosion product buildup.  The corrosion product buildup also resulted in a hole in the pipeline, which caused the leak.

The permafrost that was affected by the oil because of the leak, but also because the leak was not contained promptly. The next question is “Why was the leak not contained promptly?”  (And also, “Why did the leak occur?” which was due to a hole, but we’ll get to that later.)  The leak was not contained promptly because the the leaked oil was not visible, the location of the leak was inaccessible, and the leak detection program was ineffective. 

The severe corrosion product buildup resulted from three things.  First, there was corrosion in the pipe.  Second, the corrosion went undetected (we’ll go into both of these in more detail).  And third, the pipes were used beyond their design life (25 years vs. the 29 years they had been in service.

There was corrosion in the pipes because there were microbes in the pipe protected by a layer of sediment, and microbes produce corrosive substances (this is known as internal microbiological corrosion).  This layer of sediment was due to an ineffective maintenance program.  It settled to the bottom of the pipe because there were low spots in the pipe, and because the velocity of the oil was too low to remove sediment because the pipe diameter was too large.

The corrosion went undetected because of an ineffective inspection program.  The inspection progrm was ineffective because there was not a regular internal inspection schedule.  The ultrasonic testing used was not effective because it did not cover 100% of the line and the damage was very localized (thus the ultrasonic testing was missing the spots with the worst corrosion).  Additionally, a “smart pig”, which is used internally to measure the wall thickness of a line, was never run through the line, because BP did not believe it was necessary as they performed ultrasonic testing.

Once the Cause Map is build to a sufficient level of detail with supporting evidence the solutions step can be started. The Cause Map is used to identify all the possible solutions for given issue so that the best solutions can be selected. On the Cause Map you can see some solutions derived from the causes (in the green boxes).  Looking through news reports or BP press releases regarding their Prudhoe Bay pipeline, you’ll see that almost all of the actions listed have been or are being taken to prevent this problem from happening again.

May 19th, 2008 | Leave a Comment

Pet Food Contamination - March 2007

On March 15, 2007, the Food and Drug Adminstration (FDA) was notified that ten animals had died from eating pet food.   This began an investigation into a problem that would result in the recall of 150 brands of pet food, and would kill many animals - some veterinarians suggest up to 1,000.  We can illustrate what happened in a Cause Map.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page. 

First, we examine the impacts to the goals.  For a food manufacturer, one of the overall goals is to have zero injuries.  Some veterinarians suggest that up to 1,000 dogs and cats were killed in the U.S.  One of the other goals impacted is the customer service goal. In the case of the contaminated pet food, 150 brands (with 60 million containers of Menu Foods pet food, the most affected brand) were recalled.  This was the largest recall in FDA history, and was estimated to cost Menu Foods $54 million.

The loss of pets was caused by renal failure.  The renal failure in dogs and cats occurred because the dogs and cats ate contaminated pet food.  The dogs and cats ate contaminated pet food because it was in the food supply.  This also led to the recall.

Why was the contaminated pet food in the food supply?  The food was contaminated with up to 6% melamine and cyanuric acid (CA), and the contaminants were not detected.  The melamine and cyanuric acid (CA) were found in the food because they were added to the raw ingredients to increase the apparent content of the wheat gluten.  This reduced the cost for the manufacturer because melamine and cyanuric acid are cheaper than wheat gluten.  It increased the apparent protein content because melamine and cyanuric acid mimic the protein response in protein testing. 

The contaminants were not detected because standard tests did not detect them, and because of inadequate insepctions and inaccurate paperwork.  Standard tests did not detect the contaminants because melamine and cyanuric acid mimic protein response in protein testing, and because they were not tested for.  Inspections were inadequate because the material did not receive export inspections in China, because the exports were improperly labeled as non-food, and only food items are subject to mandatory inspection.  The inspections were also inadequate because FDA officials do not have ready access to Chinese plants because there is no binding agreement between China and the FDA.  The paperwork was inaccurate because the broker certified that the material specification was met, and the material specification forbid foreign material.

Root Cause Analysis Pet Food ContaminationEven more detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.  See the attached pdf for a visual representation of the cause and effect relationships.

May 17th, 2008 | 1 Comment

Mission to Hubble Telescope Delay

Early in March 2008, NASA announced that the shuttle mission to the Hubble telescope would take place in the fall rather than in August as originally scheduled.  A trip to Hubble is necessary to replace gyroscopes and batteries that failing.  Additionally, the mission will also be sued to install instruments that will increase the range of the telescope.   The changing schedule itself is not a cause for alarm, but the reasons between the slip are interesting.  The changing schedule shows that NASA is still struggling to recover from the tragic loss of the Columbia in many ways. 

The shuttle mission is delayed because new design fuel tanks will not be manufactured in time to support the original schedule.  In 2003, Columbia and her crew were lost when external foam fell off the fuel tank during ascent and struck the wing of the orbiter creating a plate size hole.  Initially, NASA managed the foam issue by modifying existing fuel tanks.  The last of these pre-existing fuel tanks will fly with Discovery when the shuttle launches for a space station assembly mission May 31.  The fuel tanks for future launches are being built with design modification to prevent foam loss.  This manufacturing process is taking four to five weeks longer than originally planned.  No information is available in media reports explaining why the manufacturing schedule is longer than expected.

The mission to the Hubble telescope is also the only shuttle mission planned that will not go to the international space station.  This fact is relevant because it means that two shuttles have to be prepared for launch, not just one.  Two shuttles means double the work needed to get the new fuel tanks ready for launch.  A second shuttle will be prepared in the event a rescue mission is needed. Trips to the space station are less risky because the astronauts could seek shelter in the space station if the orbiter was damaged, providing a much longer window for potential rescue.

Root Cause Analysis Hubble DelayThe attached PDF file contains an intermediate level root cause analysis of the delay of the Hubble shuttle mission.  It was built using the facts that were available in media reports.  As more details are known, the Cause Map can be expanded.

May 15th, 2008 | Leave a Comment

Sinkhole - Daisetta, TX

On May 7, 2008, a sinkhole formed in Daisetta, Texas, near the Deloach Vacuum Compnay.  The sinkhole quickly grew to approximately 900′ x 600′ x 260′.  Fortunately, no one was injured.  But it did have a severe impact, both on the Deloach company and on the town. 

We can analyze this incident in a Cause Map.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page. 

Root Cause Analysis SinkholeSee attached PDF document of the detailed level of the Cause Map.  First, on the left we begin with the impact to the goals.  There were many goals impacted in this case: safety (possibility of injury); environment (a crude oil pipeline leaked and tankers and storage tanks fell into the sinkhole., which is also an impact to the material goal, because goods were lost); customer service (in this case, residents, who were affected by the main power line severed, the main street blocked off, and the potential of anevacuation; and production (Deloach Vacuum company shut down).

It is believed that the sinkhole occurred because of the collapse of a salt dome underneath the town.  The salt dome collapsed because a portion of the salt dome dissolved due to exposure to water.  It is believed that the exposure to water was from one or more of three possible sources.  The first is that it was the natural path of groundwater, due to geological features.  The second is that the water leaked through holes drilled through the surface, either wells or drill holes for salt water disposal or oil & gas production.  The third possibility is that salt water is injected underground dissolved part of the salt dome.  Salt water is injected underground because salt water is disposed in the dome (see above) and salt water waste exists because it is taken from crude oil.

Even more detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.

The investigation is continuing, as Texas officials try to figure out how to prevent further damage to the sinkhole.  For now, the expansion appears to be slowing down, and hopefully soon life can get back to normal in this Texas town.

May 13th, 2008 | Leave a Comment

UPDATE: Heparin Contaminant Identified

Earlier this year, contamination of the U.S. supply of heparin was brought to light.  A significant portion of the U.S. supply of heparin was recalled, and the death toll potentially associated with the contamination has now climbed to 81, with hundreds of adverse events also reported.  Additionally, prior to the recall there was concern for deaths and injuries associated with the contaminated drug not fulfilling its expected purpose - preventing blood clots during surgeries and kidney dialysis - because the contaminant has no blood thinning properties.  So far, the contaminated drug has been found in 10 countries thus far, increasing concern about the drug supply chain.

Researchers have verified that the contaminant in the recalled heparin is oversulfated chondroitin sulfate (OSCS) and that they have discovered a mechanism by which the contaminant can cause the adverse effects (falling blood pressure and severe allergic reactions).  Additionally, the researchers have provided a test for regulators to screen heparin for this contaminant.  

They have determined that the OSCS was present at the active ingredient supplier plant in China.  Because OSCS does not occur in nature and mimics the chemical structure of heparin so closely, it is believed that the (mostly unregulated) crude heparin suppliers in China added OSCS to increase their profit, as OSCS is many times less expensive than heparin.  The OSCS was not detected by standard impurity tests, due to its similarity with heparin.  In Congressional hearings since the event, the Food and Drug Adminstration (FDA) has said that the inspections of the Chinese plant (as well as those of most foreign plants) were inadequate due to lack of funding for the FDA mission.

Root Cause Analysis HeparinThe attached pdf Cause Map shows that the heparin got into the drug supply after being placed in the raw ingredients.  It was not discovered by regulators, due to the lack of a commonly used, effective test.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.  As more information is released about the failings of the supply chain in this instance, we can add more details to the cause map.

May 7th, 2008 | Leave a Comment

Gas Pump Glitch

An Associated Press article, published on April 25, highlighted a common, often ignored problem of customers getting a different amount of gas then what they paid for.  Gas pumps contain a check valve that allows gas to start flowing at the same time the price meter starts.  As the check valves age, they can begin to hesitate and wait a period of time before gas flow begins.  This results in the consumer being overcharged because the price meter is turning before gas is flowing.    Worn check valves usually only cost consumers pennies per fill-up, but there have been instances of overcharges of 30 to 40 cents a gallon.  This issue doesn’t cost the consumer large amounts of money, but it adds frustration to a public already aggravated by record high gas prices.

To be fair, it should be mentioned that worn check valves sometimes help the consumer as well.  When a check valve hesitates at the end of a fill up, the price meter is stopped and a small amount of gas will continue to flow.  Also, to clarify, this isn’t a case of gas stations purposely gorging consumers.  It’s a situation where a common piece of machinery is wearing out and not functionally properly. 

To help prevent these types of errors, gas pumps are regularly inspected to ensure that consumers are charged for the correct amount of gas.  Regulations allow gas pumps to pass inspection if they overcharge by no more than 6 cents for every five gallons delivered.  Most states require gas pumps to be inspected every year to ensure accurate measurement of gas delivered.  Many counties try to inspect more frequently, but have difficultly because of staffing shortages and financial pressure.  

Root Cause Analysis Gas Pump GlitchThe attached PDF file contains an intermediate level root cause analysis of the worn check valves in gas pumps.  It was built using the facts that were available in media reports.  As more details are known, the Cause Map can be expanded.

May 5th, 2008 | Leave a Comment

Hindenburg Crash - Competing Theories

On May 6th, 1937, the Hindenburg burst into flames over the Lakehurst, NJ Naval Base, after completing a successful trip across the Atlantic.  35 of the 97 passengers (and one of the ground crew) were killed.  The Hindenburg itself was a total loss, and the popularity of airships never recovered after the accident. 

The loss of 36 lives and the loss of the Hindenburg were both caused by the fire aboard. The loss of popularity of airships was caused by both the loss of the Hindenburg, and by the loss of lives.  The next question to ask is “Why did the fire occur?”

For the Hindenburg, this is where things start to get interesting.  There are three separate theories about why the fire started.  There are people who believe very strongly in each.   Luckily for us, the beauty of the Cause Map form of a root cause analysis is that we can use it even if we haven’t determined which theory is correct.

The first theory is that the fire started from sabotage.  Because the Hindenburg was frequently used as a Nazi propaganda tool, some thought it was almost too easy of a target for sabotage from anti-Nazi activists (who included in their number the designer of Hindenburg, Dr. Hugo Eckener.)  There was even a “suspicious” character who survived the crash, a German acrobat living in America.  However, eventually the FBI dismissed the idea of sabotage as a “red herring.” 

Another theory is that the fire began when static electricity ignited the flammable cover of the airship.  The major proponent of this theory, Dr. Addison Bain, has run tests on pieces of the Hindenburg cover preserved from the wreck site.  (This was not until 1994.)  He has also found supporting evidence from historic records of the Zeppelin company.

The other theory is that static electricity ignited a flammable hydrogen-oxygen mixture.  This was the original cause attributed to the disaster by the U.S. Department of Commerce’s root cause analysis investigation after the crash.  There are also people who claim that Dr. Bain’s theory is physically impossible, and do not specifically champion a cause, but treat this one as the most likely.

Note that we’re not espousing a theory - we are just recording all of the possibilities.  Once we have done that, the cause map allows us to find solutions for any potential causes.  Once we have all the theories mapped out, we can use the cause map as a resource to determine the solutions that are most helpful, or continue our root cause analysis investigation to determine which causes are most likely.

The attached pdf document gives an intermediate level Cause Map of the incident.Root Cause Analysis Hindenburg

April 21st, 2008 | Leave a Comment

Loss of Mars Climate Orbitor

MCOThe Mars Climate Orbiter (MCO) was launched atop a Delta II launch vehicle on December 11, 1998.  Nine and a half months after launch, the MCO was scheduled to begin the process of establishing an orbit around Mars.  The plan was to use a technique called aerobraking to reduce the MCO velocity and slowly move the MCO from a 14 hour orbit to a 2 hour orbit.  On September 23, the $125 million dollar MCO was lost during the attempt to establish orbit around Mars.  Investigation into the accident revealed that the orbiter had entered the Martian atmosphere traveling too quickly with too low a trajectory.  The heat produced by friction from hitting the thicker atmosphere present at the lower trajectory at high velocity destroyed the orbiter.  The loss of the MCO cost NASA more than the $125 million dollars spent building the MCO.  In addition, NASA lost a substantial amount of time, lost all potentially gathered data, and lost some of the public support for the NASA program.

NASA investigation revealed many causes of the loss of the orbiter.  One of the most obvious causes is a unit error in the software used to help predict the velocity of the MCO, which in turn is used to predict the trajectory the MCO would enter Martian atmosphere. A little background is needed to understand how an error in the software causes errors in the predicted velocity.   Software called “Small Forces” is used to predict how the MCO’s velocity changed after a angular momentum desturation maneuver.  A angular momentum desturation maneuver is performed when one of the momentum wheels used to help the orbiter maintain orientation in space starts spinning too quickly.  During an angular momentum desturation maneuver, a wheel is deliberately slowed down (which would normally turn the spacecraft) while at the same time a jet is fired to counteract this force and keep the orientation relatively constant.  This whole process affects the speed the spacecraft is traveling and affects the trajectory of entry in the Mars atmosphere.  The error in Small Forces was simple one.  The results were in pound force and the program that predicted velocity expected them to be in Newtons.

Root Cause Analysis MCOThe attached PDF file contains an intermediate level root cause analysis of the loss of the MCO.  It was built using  facts from media reports and the NASA investigation reports. The map can be expanded using all the known data to create a detailed Cause Map.

April 18th, 2008 | Leave a Comment

UPDATE: FDA releases revised death count from heparin contamination

The Food & Drug Adminstration (FDA) recently reviewed adverse events related to heparin, which has been the subject of much scrutiny after 19 deaths were reported due to allergic reactions from contaminated vials.  Since January 2007, the FDA has received reports of 103 deaths from people taking heparin, 62 of which involved allergic reactions or hypotension (dangerously low blood pressure).  These deaths include people who were taking all brands of heparin, not just the brand affected by the contamination and recall.  The manufacturers of the brand that was contaminated and recalled says that they know of only 4 deaths assocciated with their contaminated product.  The FDA has stated that this does not mean that the deaths were necessarily caused by the allergic reactions and low blood pressure.  Although allergic reactions and low blood pressure were the cause of death of those who have died from the contaminated vials, it’s not clear that all 62 deaths are associated with contaminated heparin.    In fact, heparin carries a warning detailing risk of low blood pressure.  However, in the year 2006, only 55 deaths were reported from heparin, and only 3 were due to allergic reactions.  So there is obviously something that is increasing the number of allergic reactions to heparin.  Hopefully the increase in deaths is due to the contaminated heparin that has already been recalled from the market, but it’s possible that there are other issues, or other brands that are also contaminated.  The FDA continues to investigate, and hopefully can provide answers soon, especially to the people who depend on heparin for their well-being. 

The previous blog entry shows an intermediate level Cause Map (root cause analysis) as a downloadable pdf.

April 16th, 2008 | Leave a Comment

UPDATE: Grounded Flights for American Airlines

Root Cause Analysis American AirlinesRoot Cause Analysis American AirlinesAmerican Airlines resumed a normal flight schedule Saturday afternoon, ending a period of widespread flight cancellations.  Between April 8 and 12, 3,300 flights were canceled when all MD-80 jetliners in the American Airlines fleet were grounded.    More than a quarter of a million passengers were affected by the widespread flight cancellations.  As discussed in a previous blog, these drastic measures were taken when a large percentage of inspected MD-80s failed to meet FAA regulations on wiring from the airframe to a pump in the wheel well.  The wiring can be a fire hazard and affect power distribution. An intermediate level Cause Map showing the causes of the cancellations can be seen in the previous blog posted on April 10.

The cancellations may be over, but the effects will continue to linger.  The cost to the American Airline is estimated to be in the tens of millions of dollars.  In addition to lost revenue, American Airlines gave many inconvenienced passengers $500 travel vouchers and paid to put stranded travelers in hotels.  It is also difficult to put a financial cost on the huge amount of negative publicity that the airline has received as a result of these cancellations, but it is guaranteed to affect their business.  In addition to the financial burden of these cancellations, the entire airline industry is faced with raising fuel costs and this is going to put even more pressure on American Airlines.  Already, American Airlines announced on Friday (ironically on a day when nearly 600 flights were canceled) that it will be raising prices by as much as $30 a round trip tickets to help compensate for high fuel costs.  These dual blows to the bottom line are going to affect the health of the American Airline company for the foreseeable future.

It is also likely that many other airlines will be similarlly affected.  Doing a root cause analysis, it is clear that one of the causes of these cancellations is a new focus by the FAA on “zero tolerance” for any deviations from their detailed regulations.  As airlines struggle to understand the new inspection criteria, it is likely that other airlines will face cancellations.  The airline industry as a whole is facing some high hurdles in the upcoming months.  Four discount carriers have already declared bankruptcy in the last month and it is likely others will follow suit.  Even the established, traditional carriers are seeking changes to stay competitive.  For example, rumors are circulating about a possible Northwest and Delta merger.  This is going to be a turbulent time for Airlines and passengers.

April 14th, 2008 | Leave a Comment

Site Map   Root Cause Analysis