All posts by Kim Smiley

Mechanical engineer, consultant and blogger for ThinkReliability, obsessive reader and big believer in lifelong learning

Grounding of the Empress of the North

Download PDFby Kim Smiley

On May 14 2007, the 300 foot cruise ship, Empress of the North, grounded out on rocks while rounding Rocky Island during a trip through Alaska’s Inland Passage.  There was significant damage to the hull and the two starboard propellers needed to be replaced.  Costs of repairs totaled more than $4.8 million.  Luckily no one was injured, but over two hundred passengers had to be evacuated from the ship.

This is a common route for cruise ships and the rocks were a well-known hazard clearly marked on navigation charts.  So what happened?

A root cause analysis shows that there were many causes that contributed to the accident.  One of causes is that there were no lookouts at the time of the accident.  The crew members who would have acted as lookouts were performing security rounds.  This was in violation of regulations requiring lookouts at all times and appears to have been a common practice for the crew.

When determining causes it’s important to ask, what is different?  In this case, this was the first watch as Deck Officer for the officer in charge.  He had recently graduated, was newly licensed and inexperienced.  He was not familiar with the deck procedures and the equipment. There was a lot of confusion about watch team roles and he didn’t attempt to take charge of the ship’s navigation until seconds before the grounding occurred.  The National Transportation Safety Board (NTSB) found that the actions, or inaction as the case may be, of the Deck Officer were one of the major factors contributing to the accident.

It’s tempting to stop at this point, but the analysis needs to go farther than just identifying the actions of the Deck Officer as a cause to do a thorough investigation.  Why was he standing watch if he wasn’t fully qualified?  Why wasn’t he prepared adequately prior to being given the responsibility?

The crew member originally assigned the watch was ill.  There are a limited number of possible replacements on a ship this size.  The Master of the ship believed the watch would be a good training watch because it was an easy watch with minimal course corrections needed.  It was also not the practice of the crew to have specific night orders for the overnight watches so the newly arrived junior third officer found himself standing the midnight to 4 am watch with minimal guidance.

Many investigations lead back to human error, but it’s important to ask questions beyond that point.  Changing how people are trained, improving the environment, and providing specific writing inspections can help prevent human errors in many cases.

(The photo above is an official Coast Guard photo.)

Hubble Focusing Issues

Download PDFBy Kim Smiley

The Hubble Space Telescope was launched on April 24, 1990.  Once in orbit, it was quickly discovered that the images from Hubble were blurred.  An investigation into the issue revealed that Hubble’s primary mirror was not built to specification and couldn’t properly focus the light.  Specifically, the mirror was flattened too much away from the center and caused the light reflected from the edge of the mirror to focus on a slightly different location than the light reflected from the center.   The primary mirror in Hubble was only off specification by 2.3 micrometers, but the result to the $1.5 billion dollar project was disastrous.

Solving Hubble’s focus issues was no small feat.  How do you repair a mirror that can’t be replaced on orbit when it is cost prohibitive to bring it back to earth for repair?  The answer was to modify the lens (which met specifications) to work with the off specification mirror.  COSTAR (Corrective Optics Space Telescope Axial Replacement) was added to Hubble during the first servicing mission in December 1993.  COSTAR is essentially eyeglasses for Hubble, additional lens built with the same error as the mirror, but in the opposite direction so that the effects of the off specification mirror shape are canceled out.  With the addition of COSTAR, Hubble met original design goals.

The primary mirror was constructed with a flaw because the tool, called a null corrector, used to create the template to guide the shaping of the mirror was itself flawed.  Null correctors use precisely located mirrors and lens to determine the shape of a mirror.  In order to assemble null correctors, reflected light is used to measure the distance between the mirror and the lens inside the tool.  When the null corrector used to shape the Hubble’s primary mirror was assembled a measurement error was made.  A small amount of reflective coating had fallen off an internal piece of the instrument and the laser used to perform the measurement reflected off the wrong location, resulting in a lens being 1.3 mm to far from the mirror.  Null correctors are extremely precise and do not change once assembled so the Hubble team used a single instrument to guide the mirror shape.  A single flawed tool and inadequate quality controls resulted in a flawed mirror.

A visual representation of the root cause analysis has been created as a Cause Map that can be downloaded.

View a video about the Hubble Telescope.

Brooklyn Bridge Turns 125

Brooklyn BridgeBy Kim Smiley

Brooklyn Bridge marks its 125th birthday on May 24, 2008.  When performing a root cause analysis it is easy to spend a large amount of time focused on failures, but today engineers should take a moment to appreciate the accomplishment of this truly amazing feat.  The bridge has been refurbished many times, but the towers, main cables, and main beams are original and are now 125 years old.

At the time the Brooklyn Bridge was constructed the 6,000 ft long bridge was roughly six times as long as the longest bridge of the type that had previously been built.  The Brooklyn Bridge is one of the nation’s oldest and most treasured suspension bridges.  It has shaped the development of New York City.  At the time it was constructed Brooklyn was largely rural and the bridge helped sparked a growth spurt that dramatically changed the face of Brooklyn.  Brooklyn’s population grew by 42 percent between 1880 and 1890.  At last count in 2006, the bridge carried 126,000 cars per day.

Recent inspections have revealed some deterioration of the bridge, primarily with the newer approach ramps.  In a recent survey, state inspections ranked its condition as “poor”.  New York City plans to spend $250 million to 300 million to fix and repaint the bridge.  Hopefully these updates will return the bridge to good condition and it will continue to safely serve the citizen of New York City for many decades to come.

Train Derailment – Lafayette, Louisiana

by Kim Smiley

About 1:40 am on May 17, six rail cars derailed and overturnedDownload PDF near Lafayette, Louisiana.  One of the cars was damaged and leaked about 11,000 gallons of hydrochloric acid.  Five people, including two rail workers, were sent to a hospital and treated for eye and skin irritation.

Authorities evacuated people with 1 mile of the accident.  Approximately 3,000 people were affected, including a few small businesses and a nursing home.   All affected people are being reimbursed for food and hotel costs by the railway company that operated the train.

There was potential for further release of chemicals because one of the rail cars involved in the accident carried ethylene oxide, a flammable and dangerous chemical, and two of the remaining cars also carried hydrochloric acid.

The Louisiana State Police’s hazardous materials unit is overseeing clean-up of the accident site.  The spill is being neutralized with lime and the contaminated material will be removed and disposed of.  The rail car containing ethylene oxide was removed from the site quickly to reduce the potential for additional problems.

The cause of the derailment is not known at this time.  The Federal Railroad Administration will conduct an investigation of the accident.

The attached PDF file contains an intermediate level root cause analysis of the train derailment built using Cause Mapping, a visual form of root cause analysis.  It was built using the facts that were available in media reports on the accident.  As more details are known, the Cause Map can be expanded.

Mission to Hubble Telescope Delay

Download PDFBy Kim Smiley

Early in March 2008, NASA announced that the shuttle mission to the Hubble telescope would take place in the fall rather than in August as originally scheduled.  A trip to Hubble is necessary to replace gyroscopes and batteries that failing.  Additionally, the mission will also be sued to install instruments that will increase the range of the telescope.   The changing schedule itself is not a cause for alarm, but the reasons between the slip are interesting.  The changing schedule shows that NASA is still struggling to recover from the tragic loss of the Columbia in many ways.

The shuttle mission is delayed because new design fuel tanks will not be manufactured in time to support the original schedule.  In 2003, Columbia and her crew were lost when external foam fell off the fuel tank during ascent and struck the wing of the orbiter creating a plate size hole.  Initially, NASA managed the foam issue by modifying existing fuel tanks.  The last of these pre-existing fuel tanks will fly with Discovery when the shuttle launches for a space station assembly mission May 31.  The fuel tanks for future launches are being built with design modification to prevent foam loss.  This manufacturing process is taking four to five weeks longer than originally planned.  No information is available in media reports explaining why the manufacturing schedule is longer than expected.

The mission to the Hubble telescope is also the only shuttle mission planned that will not go to the international space station.  This fact is relevant because it means that two shuttles have to be prepared for launch, not just one.  Two shuttles means double the work needed to get the new fuel tanks ready for launch.  A second shuttle will be prepared in the event a rescue mission is needed. Trips to the space station are less risky because the astronauts could seek shelter in the space station if the orbiter was damaged, providing a much longer window for potential rescue.

The attached PDF file contains an intermediate level root cause analysis of the delay of the Hubble shuttle mission.  It was built using the facts that were available in media reports.  As more details are known, the Cause Map can be expanded.

Gas Pump Glitch

Download PDFBy Kim Smiley

An Associated Press article, published on April 25, highlighted a common, often ignored problem of customers getting a different amount of gas then what they paid for.  Gas pumps contain a check valve that allows gas to start flowing at the same time the price meter starts.  As the check valves age, they can begin to hesitate and wait a period of time before gas flow begins.  This results in the consumer being overcharged because the price meter is turning before gas is flowing.    Worn check valves usually only cost consumers pennies per fill-up, but there have been instances of overcharges of 30 to 40 cents a gallon.  This issue doesn’t cost the consumer large amounts of money, but it adds frustration to a public already aggravated by record high gas prices.

To be fair, it should be mentioned that worn check valves sometimes help the consumer as well.  When a check valve hesitates at the end of a fill up, the price meter is stopped and a small amount of gas will continue to flow.  Also, to clarify, this isn’t a case of gas stations purposely gorging consumers.  It’s a situation where a common piece of machinery is wearing out and not functionally properly.

To help prevent these types of errors, gas pumps are regularly inspected to ensure that consumers are charged for the correct amount of gas.  Regulations allow gas pumps to pass inspection if they overcharge by no more than 6 cents for every five gallons delivered.  Most states require gas pumps to be inspected every year to ensure accurate measurement of gas delivered.  Many counties try to inspect more frequently, but have difficultly because of staffing shortages and financial pressure.

The attached PDF file contains an intermediate level root cause analysis of the worn check valves in gas pumps.  It was built using the facts that were available in media reports.  As more details are known, the Cause Map can be expanded.

Loss of Mars Climate Orbiter

Download PDFBy Kim Smiley

The Mars Climate Orbiter (MCO) was launched atop a Delta II launch vehicle on December 11, 1998.  Nine and a half months after launch, the MCO was scheduled to begin the process of establishing an orbit around Mars.  The plan was to use a technique called aerobraking to reduce the MCO velocity and slowly move the MCO from a 14 hour orbit to a 2 hour orbit.  On September 23, the $125 million dollar MCO was lost during the attempt to establish orbit around Mars.  Investigation into the accident revealed that the orbiter had entered the Martian atmosphere traveling too quickly with too low a trajectory.  The heat produced by friction from hitting the thicker atmosphere present at the lower trajectory at high velocity destroyed the orbiter.  The loss of the MCO cost NASA more than the $125 million dollars spent building the MCO.  In addition, NASA lost a substantial amount of time, lost all potentially gathered data, and lost some of the public support for the NASA program.

NASA investigation revealed many causes of the loss of the orbiter.  One of the most obvious causes is a unit error in the software used to help predict the velocity of the MCO, which in turn is used to predict the trajectory the MCO would enter Martian atmosphere. A little background is needed to understand how an error in the software causes errors in the predicted velocity.   Software called “Small Forces” is used to predict how the MCO’s velocity changed after a angular momentum desturation maneuver.  A angular momentum desturation maneuver is performed when one of the momentum wheels used to help the orbiter maintain orientation in space starts spinning too quickly.  During an angular momentum desturation maneuver, a wheel is deliberately slowed down (which would normally turn the spacecraft) while at the same time a jet is fired to counteract this force and keep the orientation relatively constant.  This whole process affects the speed the spacecraft is traveling and affects the trajectory of entry in the Mars atmosphere.  The error in Small Forces was simple one.  The results were in pound force and the program that predicted velocity expected them to be in Newtons.

The attached PDF file contains an intermediate level root cause analysis of the loss of the MCO.  It was built using  facts from media reports and the NASA investigation reports. The map can be expanded using all the known data to create a detailed Cause Map.

Learn more about the Mars Climate Orbiter.

UPDATE: Grounded Flights for American Airlines

Root Cause Analysis American AirlinesBy Kim Smiley

American Airlines resumed a normal flight schedule Saturday afternoon, ending a period of widespread flight cancellations.  Between April 8 and 12, 3,300 flights were canceled when all MD-80 jetliners in the American Airlines fleet were grounded.    More than a quarter of a million passengers were affected by the widespread flight cancellations.  As discussed in a previous blog, these drastic measures were taken when a large percentage of inspected MD-80s failed to meet FAA regulations on wiring from the airframe to a pump in the wheel well.  The wiring can be a fire hazard and affect power distribution. An intermediate level Cause Map showing the causes of the cancellations can be seen in the previous blog posted on April 10.

The cancellations may be over, but the effects will continue to linger.  The cost to the American Airline is estimated to be in the tens of millions of dollars.  In addition to lost revenue, American Airlines gave many inconvenienced passengers $500 travel vouchers and paid to put stranded travelers in hotels.  It is also difficult to put a financial cost on the huge amount of negative publicity that the airline has received as a result of these cancellations, but it is guaranteed to affect their business.  In addition to the financial burden of these cancellations, the entire airline industry is faced with raising fuel costs and this is going to put even more pressure on American Airlines.  Already, American Airlines announced on Friday (ironically on a day when nearly 600 flights were canceled) that it will be raising prices by as much as $30 a round trip tickets to help compensate for high fuel costs.  These dual blows to the bottom line are going to affect the health of the American Airline company for the foreseeable future.

It is also likely that many other airlines will be similarlly affected.  Doing a root cause analysis, it is clear that one of the causes of these cancellations is a new focus by the FAA on “zero tolerance” for any deviations from their detailed regulations.  As airlines struggle to understand the new inspection criteria, it is likely that other airlines will face cancellations.  The airline industry as a whole is facing some high hurdles in the upcoming months.  Four discount carriers have already declared bankruptcy in the last month and it is likely others will follow suit.  Even the established, traditional carriers are seeking changes to stay competitive.  For example, rumors are circulating about a possible Northwest and Delta merger.  This is going to be a turbulent time for Airlines and passengers.

Grounded Flights for American Airlines

By Kim Smiley

Download PDFStarting April 8, 2008, American Airlines grounded nearly half of its fleet when it pulled all 300 McDonell Douglas jets (MD-80s) from service.  At least 2,400 flights were canceled.  It is estimated that 100 passengers would have been on each of the canceled flights, bringing the total of affected passengers to nearly a quarter of a million people.  The MD-80s were grounded because 15 of 19 inspected aircraft failed FAA inspection this week.  The issue is with the installation of wiring connecting the airframe to a hydraulic pump in the wheel well.  The regulations are written to prevent rubbing and chafing of the wiring, which can lead to exposed wiring.  Exposed wiring is a concern because it can to power issues and shorts, and it is a potential fire hazard.

The most alarming part of the story is that American Airlines grounded these same planes for the exact same issue on March 26 and 27.  Over 350 flights were canceled while the planes were inspected and repaired if necessary to compile with the FAA wiring regulations.  All planes were back in service on March 28 after American Airlines asserted they satisfied the regulation.  Little information is available on what went wrong two weeks ago.   There are a number of questions that would need to be answered to perform a thorough investigation.  Are the FAA regulations confusing?  Do the AA mechanics need additional training?  Did the airline fail to internally check the wiring prior to putting the planes back into service?   If an inspection did occur, did the inspectors understand what they were looking for?   It may not be clear exactly what went wrong, but it is clear that something failed in the system to cause this second round of cancellations.

The attached PDF file contains an intermediate level root cause analysis of the cancellation of American Airline flights on April 8-9.  It was built using the facts that were available in media report.  There are many details still missing, that could be added as more details are known.

Root Cause Analysis: Monte Carlo Hotel Fire – Las Vegas, NV

Download PDFBy Kim Smiley

Just before 11 am on January 25, 2008, a fire started on the roof of the 32 story Monte Carlo Hotel in Las Vegas.  The fire spread quickly along the outside of the building, fueled by the highly flammable foam like material, Exterior Insulation Finishing System (EIFS), used to construct the hotel façade.  A spark from a hand held cutting torch being used on the roof of the hotel hit the EIFS and started the fire.  6,000 guests and workers were evacuated from the hotel.  The hotel remained closed until February 15.   Considering both the damage to the hotel and lost business, the total cost of the fire is approximately $100 million dollars.  Luckily, no major injuries resulted from the fire.

A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.  The Cause Map shows that the fire started because a spark from a hand held torch hit a flammable material.  The Cause Map can also be used to identify possible solutions that would prevent another fire.  In this case, two areas that would merit farther investigation would be the use of highly flammable material on buildings and the lack of protective measures taken to protect the EIFS from the sparks.  For example, there were no mats in place to protect the EIFS from being hit by sparks.  From the information available, it isn’t clear why no protective measures were taken to protect the EIFS, but it is known that the contractor failed to obtain the correct permit (which involves getting information on appropriate safety procedures). It is reported in an Associated Press article on the fire that Las Vegas city officials are currently evaluating whether restrictions should be placed on the use of EIFS.

The attached PDF file contains an intermediate level root cause analysis of the hotel fire.  It was built using the facts that were available in media reports on the fire.  As more details are known, the Cause Map can be expanded.