Hubble Focusing Issues [ August 4th, 2008 ] Posted in » Root Cause Analysis - Incident Investigation

Hubble TelescopeThe Hubble Space Telescope was launched on April 24, 1990.  Once in orbit, it was quickly discovered that the images from Hubble were blurred.  An investigation into the issue revealed that Hubble’s primary mirror was not built to specification and couldn’t properly focus the light.  Specifically, the mirror was flattened too much away from the center and caused the light reflected from the edge of the mirror to focus on a slightly different location than the light reflected from the center.   The primary mirror in Hubble was only off specification by 2.3 micrometers, but the result to the $1.5 billion dollar project was disastrous. 

Solving Hubble’s focus issues was no small feat.  How do you repair a mirror that can’t be replaced on orbit when it is cost prohibitive to bring it back to earth for repair?  The answer was to modify the lens (which met specifications) to work with the off specification mirror.  COSTAR (Corrective Optics Space Telescope Axial Replacement) was added to Hubble during the first servicing mission in December 1993.  COSTAR is essentially eyeglasses for Hubble, additional lens built with the same error as the mirror, but in the opposite direction so that the effects of the off specification mirror shape are canceled out.  With the addition of COSTAR, Hubble met original design goals.

The primary mirror was constructed with a flaw because the tool, called a null corrector, used to create the template to guide the shaping of the mirror was itself flawed.  Null correctors use precisely located mirrors and lens to determine the shape of a mirror.  In order to assemble null correctors, reflected light is used to measure the distance between the mirror and the lens inside the tool.  When the null corrector used to shape the Hubble’s primary mirror was assembled a measurement error was made.  A small amount of reflective coating had fallen off an internal piece of the instrument and the laser used to perform the measurement reflected off the wrong location, resulting in a lens being 1.3 mm to far from the mirror.  Null correctors are extremely precise and do not change once assembled so the Hubble team used a single instrument to guide the mirror shape.  A single flawed tool and inadequate quality controls resulted in a flawed mirror.

Root Cause Analysis :: Hubble Focus Issue A visual representation of root cause analysis has been created as a Cause Map that can be downloaded.

Florida Power Outage

Incident date: Feberuary 26, 2008.  Florida Officials can’t figure out what caused the power outages that occurred Tuesday.  There are many things that contributed to the outages, but none of them should have been sufficient to cause outages of the extent that occurred.  A root cause analysis can come in handy, even if all the causes aren’t known.  The whole purpose of these analyses is to show the factors that caused a given problem.  Frequently we do that simply by arranging root causes that we already know, to ensure that we’ve covered all the bases.  Well, we can do the same thing in the middle of an investigation, even if we’re not sure what all the causes were.  In fact, the exercise can assist us in finding the problem.  So, we’ll map what we know, leaving question marks in areas of uncertainty.  A root cause analysis, based on what is known at the time, is shown below. 

First, what is the impact to the goals?  Well, a power company strives to provide electricity, so when 3 million people are left without power, it’s an impact to the customer service goals.  Additionally, having a reduction in the amount of electricity available is an impact to the production goal.  People lost power for two reasons: 1) the decrease in power distribution capability and 2) the reduction of power available.  There was less power available because Florida was unable to borrow from other states, due to the fact that it has fewer connections because of its geographic distance from other states.  There was also less power because three power plants (2 nuclear, 1 natural gas) were shut down.  These plants automatically shut down when they register a disturbance in the electrical grid in order to protect the equipment from voltage fluctuations.  The disturbance in the electrical grid (and the reason for the decrease in power distribution capability) was due to the disabling of two power distribution lines.  After this is where it gets fuzzy.  We’re not sure why the two lines were disabled.  We know that there was a fire at the substation, and a failed switch, which caused a short circuit.  But we’re not sure how those happened either.  It’s possible that the short circuit was the root cause of the fire, but for now we’ll just leave it like this.  Looking below, we see that we have a high-level root cause analysis nearly completed, and that the focus of our analysis should be on what caused the disabling of the power distribution lines, and what caused the fire and switch failure at the substation. 

Even if our thoughts on a problem aren’t complete, it can help immensely to organize them by performing a root cause analysis, even if there are some holes (shown with question marks).  It’s a great place to begin!

Root Cause Analysis Florida Power Outage
Root Cause Analysis Florida Power Outage

February 28th, 2008 | Leave a Comment

UPDATE: US Beef Recall

I wanted to add a few more interesting facts on the recent beef recall as the ramifications continue to surface.  As a quick recap, on February 17, 143 millions pounds of beef were recalled.  For perspective, that’s enough beef to make every person in the US about two hamburgers.  The scope of the recall is rapidly expanding and it may become the largest food recall in US history.  The full magnitude of the recall is just now becoming apparent because it takes weeks to track down all the products containing the recalled beef.  Take a second to think of all the products in a grocery store that contain beef and you can imagine how large this recall is likely to become.  The amount of food that is going to be destroyed is mind boggling and the cost is likely to be in the hundreds of millions of dollars.  Keep in mind that no cases of illness have been reported, a large amount of the beef has already been consumed, and the U.S. Department of Agriculture classifies the risk to consumers as remote.  Does it make sense to destroy all this food? As you consider the scope of the recall, I ask you also to consider a root cause analysis of the problem.  The previous blog asked the question, what is the best approach to prevent this type of problem from happening again? I still don’t now the answer, but I do know that a recall alone does not solve the initial problems that caused the issue.  What cause really lead to sick cows being mistreated and then slaughtered for human consumption?    A recall deals with the problem after the fact and a good solution would change something in the process prior to the meat entering the food chain.  The USDA has stated that it will not be increasing inspections at food processing plants and I haven’t found any evidence that other changes are being made in the work process at the slaughterhouses.  I’ll be continuing to cook my meat well done.

February 26th, 2008 | Leave a Comment

Largest Beef Recall in US History

Incident Date: February 17, 2008  One of the most interesting things about root cause analysis is its wide spread application.   As an engineer, I tend to think about root cause analysis applying to mechanical failures, safety incidents or manufacturing issues, but it can be applied to any system.  Take for instance the recent beef recall.  The largest beef recall in US history was initiated on February 17 when Westland/Hallmark Meat Company recalled 143 million pounds of beef.  What started the whole thing was an undercover video distributed by the Humane Society of the United States which showed workers kicking, shocking and even fork lifting sick cows to force them on their feet so they could be slaughtered.  Beyond the animal cruelty issues (two workers involved have since been charged), the issue is that meat from sick cows was processed and sold.  Government regulations ban cows that can not walk from entering the food supply because consumption of their meat may lead to illness, including mad cow disease.  So how did sick cows end up being slaughter and sold to millions of people?  What is the best approach to prevent this type of problem from happening again?  Is the answer that we need more government regulations, more frequent inspections or stricter penalties for companies that violate the current regulations?  Whose fault is it?  Is it the farmers for selling the cows, the health inspectors for missing sick cows or the slaughterhouses for processing sick cows?  Performing a root cause analysis would show you that there isn’t one right single answer.  All you have to do is look at the recent increase in beef recalls to realize that a simple, single cause solution won’t work.    There were five recalls in 2005, eight in 2006 and 21 recalls in 2007.  These recalls were not limited to one plant or even one company.  Clearly, fining one company or firing a few workers isn’t going to fix the beef supply issues.  You need to attack the root of the problem to keep it from growing back and to do that you need to find the root causes (plural).    The information needed to do a detailed analysis isn’t available to the public yet, but a very basic root cause analysis follows.

Root Cause Analysis Beef Recal

February 22nd, 2008 | Leave a Comment

Goals Define the Problems in your Organization

For a particular failure, loss or incident, people will naturally disagree about what the problem is.  Some people will say the problem is this and others will say the problem is that and still others will let everyone know what the real problem is.  People see problems differently.  This is a given for any root cause analysis facilitator.

Is it possible for everyone to agree on the problem?  Yes.  It may seem unrealistic until we look specifically at what a problem is.  A problem is anything that negatively affects the ideal state.  People may see many different issues as a problem, but within an organization the ideal state is already defined.  The ideal state within an organization is also known as the overall goals.  Any negative deviation from the organization’s overall goals is the accurate, complete and consistent approach for defining a problem.  For example, let’s consider your local power plant.

What is the ideal state of that power plant?  Let’s say the power plant is supposed to produce 1000 megawatts per day.  Any negative deviation from 1000 megawatts is a problem.  If the plant produced 900 megawatts then the deviation is 100 megawatts (a production loss).  We could even put an economic value on this production loss.  But producing power is not the only goal of the power plant.  Organizations don’t have a goal.  They always have goals (plural).

The safety goal for the power plant is zero injuries.  Any injury is a deviation from the ideal state.  Some safety incidents are more critical than others.  The larger the magnitude of the impact to the goals the more thorough the investigation is.  A paper cut is an injury, but it’s not as serious as someone receiving 15 stitches.  Some problems are bigger than others.  The magnitude of the impact on the goals dictates importance as well as how thorough the investigation will be.  Minor incidents have relatively basic investigations while major issues require much more comprehensive analyses.

The ideal state of the power plant also includes no environmental issues as well as no customer service interruptions, no property or material losses, and no excess reactive or rework labor costs.  The overall goals of the power plant are safety, environmental, compliance, customer, production, and materials and labor (which are usually captured within maintenance).  Any negative deviation to any one of these overall goals is truly what the power plant should focus on for their problem solving and root cause analysis efforts…everyday.

The overall goals change for each type of organization.  A hospital has different overall goals than a food processor, an oil company or a bank.  Regardless of the organization or industry, the impact to the overall goals dictates where the root cause analysis efforts should be.

The Cause Mapping method to root cause analysis has a specific way of defining every problem by the organization’s overall goals.  People naturally disagree about what the problem is.  In the Cause Mapping method of root cause analysis it’s much simpler for the facilitator to accommodate disagreements about the problem – it’s expected.  The differences provide great insight into people’s view of the problem.  To get agreement, ask the participants, as a group, how each of the overall goals were impacted (if at all).  Amazingly, people will not disagree about the impact to the goals.  They will disagree about the responses to the question “What’s the problem?”  However, they will give the same answers to each of the goal questions.  Managers and front line people will give the same answers.  It’s powerful because it’s so basic.  Goals dictate what the problems are.

During an injury investigation in the power plant where someone sprained their ankle, when the facilitator asks “Was anyone hurt?” everyone will answer with “yes, John sprained his ankle.”  It’s obvious.  If you ask what the problem, people’s responses will be all over the place; he just tripped, the barrier is bad, maybe the floor was slick, inattention to detail, procedure not followed, etc.  In your problem solving and root cause analysis investigations experiment with this idea of defining every incident by the impact it has on the overall goals.

To learn more about quickly, clearly and accurately defining problems in your business attend one of our Public Cause Mapping Workshops listed on our web site or bring our workshop to your facility.  The Cause Mapping method is an extremely effective systems-based approach to root cause analysis.  Visit us at www.ThinkReliablity.com to learn more about improving the way your organization analyzes, documents, communicates and solves problems.

February 19th, 2008 | Leave a Comment

Sugar Refinery Explosion at Imperial Sugar Factory – Port Wentworth, Georgia

Incident Date: February 7, 2008

Who knew that tiny particles of sugar dust could be so dangerous?  Federal investigators’ analysis has shown that the explosion at the Imperial Sugar factory was accidental, and that the root cause was ignition of clouds of sugar dust.  When I think of sugar dust, making cotton candy is what comes to mind.  However, when the stuff builds up, it can ignite and explode.  To avoid this problem, Imperial has extraction equipment that moves all the dust particles up to dust collectors on the roof.  Apparently there was an explosion three weeks before the recent deadly explosion resulting from ignition of accumulated sugar dust in one of these dust collectors that the earlier explosion occurred.  The sugar was apparently ignited by a small piece of metal that got into the equipment and created a spark.  The good news, for this earlier, more minor explosion, was that the ventilation panels in the dust collector opened as designed to minimize damage.  It’s unclear how this explosion may or may not have played into the more recent, fatal explosion, but one thing is clear: they weren’t so lucky this time.  On the February 7th explosion, which has killed eleven workers so far (fourteen are still in critical condition), the ignition occurred in a basement area beneath the plant’s storage silos.  Although this area also has extraction equipment, investigators have determined that there was still enough dust there to cause the explosion and feed the subsequent fire, which raged for a week.  What they don’t know, yet, is whether the extraction equipment was working, and what caused the sugar dust to explode.  A very basic root cause analysis follows, based on the information known so far.

Cause Map - Sugar Refinery

February 18th, 2008 | Leave a Comment

ThinkReliability investigates problems, including historical incidents.  Some examples of these incidents include, but are not limited to, the sinking of the Titanic, the Tacoma Narrows Bridge, the Exxon Valdez oil spill and the BP Refinery Explosion in Texas City.  The Cause Mapping method of root cause analysis was used to create a visual picture of the cause and effect relationships of the incidents. 

February 9th, 2008 | Leave a Comment

Site Map   Root Cause Analysis