All posts by Kim Smiley

Mechanical engineer, consultant and blogger for ThinkReliability, obsessive reader and big believer in lifelong learning

Hundreds Saved by Arduous Helicopter Rescue From Ferry Fire

By Kim Smiley

In a grueling rescue effort, 427 people were saved from a passenger ferry, Norman Atlantic, which caught fire December 28, 2014 off the coast of Greece.  About 150 people managed to escape the fire in lifeboats, but the remaining passengers were lifted to safety via helicopter.  Gale force winds, heavy rain and darkness all combined to make a difficult rescue operation even more daunting. Ten people died as a result of the accident with few details known about what caused the fatalities.

A Cause Map, a visual root cause analysis, can be built to analyze this incident.  The investigation is just beginning and there are still many unknowns, but an initial Cause Map can be begun that can easily be expanded to incorporate new information as it becomes available.  Even the exact number of people onboard has been difficult to determine because there were several stowaways discovered during the rescue operations that weren’t listed on the ship’s manifest.

What is known is that the fire began early in the morning of December 28th and 427 people were rescued off the ferry. Early reports are that the fire started on the parking deck where there were tanker trucks filled with oil.  Witness accounts indicate that the fire spread fairly quickly, leading to speculation that the fire doors failed.  As the fire progressed, the ship lost power.  Once power was gone, the lifeboats were useless because they require electricity to be lowered.  The heat from the fire drove passengers to the top deck and bridge where they were bombarded by cold, rain and thick smoke for many miserable and likely terrifying hours.  Helicopters pulled passengers to safety one by one, working through the windy night with night vision goggles.

In a stark contrast to the South Korea ferry that capsized off Byungpoong in April, the captain was the last person to leave the Norman Atlantic. The rescue effort was truly impressive.  As Greek Prime Minister Antonis Samaras said, the “massive and unprecedented operation saved the lives of hundreds of passengers following the fire on the ship in the Adriatic Sea under the most difficult circumstances.”

The Italian Transport Ministry has seized the vessel pending an investigation into the fire and thorough inspection of the ship.  Whenever a disaster of this magnitude occurs, it is worth understanding exactly what happened and reviewing what could be done better in the future.  There will be many lessons to learn from this incident, both in how to prevent and fight shipboard fires and how to perform helicopter rescues at sea.

To view a high level Cause Map of this incident, click on “Download PDF” above.

10,000 Pound Buoy Falls on Workers

By Kim Smiley

On December 10, 2014, a buoy that weighs close to 10,000 pounds fell onto workers at an inactive ship maintenance facility in Pearl Harbor. Two workers were killed and two others sustained injuries. While an object this large is an extreme example of the dangers of dropped objects, worker injuries and deaths from falling objects of all sizes is a significant safety concern. A US census report of fatal occupation injuries states that 245 workers were killed after being struck by falling objects in 2013 alone.

The case of the dropped buoy can be built into a Cause Map, a visual root cause analysis, to better understand what happened. Understanding the details of an accident is necessary to ensure that a wide range of solutions is considered and that any solutions implemented will be effective at preventing future incidents.

The investigation into the falling buoy is still underway so some information is not yet available, but it can easily be incorporated into the Cause Map once it is known. Any causes that need more information or evidence can be noted with a question mark to show that there is still an open question.

Exactly what caused the buoy to drop hasn’t been released yet, but it is known that the safety lines attached to the buoy failed. Both of these issues need to be investigated to ensure that solutions can be implemented to prevent further tragedies.

Additionally, there are open questions about why people were working under the path of the lift. The workers were wearing hard hats, but this is obviously inadequate protection against a 10,000 buoy. The contractors were working to strengthen mooring lines at the time of the accident, but no one should be where they could be crushed if such a large object was dropped, as it was in this case. As stated by Jeff Romeo, the Occupational Safety and Health Administration (OSHA) Honolulu area director, “We’re still looking at the facts to try to determine the exact locations of where these employees were located. If in fact, they were working directly underneath the load, then that would be an alarming situation.”

The OSHA investigation is currently underway and is expected to take four to six months. Additionally, the Navy is launching a Safety Investigation Board to review the accident with findings expected to be released by February. Once the investigation is complete, work processes will need to be reviewed to see what changes need to be made to prevent any future injuries from falling objects.

To view an initial Cause Map of this incident, click on “Download PDF” above.

When Air Bags Take, Instead of Save, Lives

By Kim Smiley

Air bags are designed to save lives and there is no doubt that they do, but they can also be deadly if they malfunction.  At least 5 deaths and many more injuries since 2004 have been tied to metal fragments that burst out of faulty air bags.

A Cause Map, or visual root cause analysis, can be used to analyze the problem with some air bags manufactured by Takata, one of the largest air bag companies in the world.  A Cause Map visually lays out the causes that contributed to a problem to show the cause-and-effect relationship between them.  A Cause Map is built by asking “why” questions.  So why are people being injured and even killed by air bags?

The air bags in question have a metal canister inside them that contains a solid wafer of chemical propellant.  Once the propellant is ignited, a chemical reaction occurs that very quickly creates gas that is used to inflate an air bag.  The problems happen when the wafer of propellant burns too quickly and the pressure from the gas over-pressurizes the metal canister.  If the canisters burst, metal fragments are shot into the vehicle where they can hit passengers.

One of the more interesting (and alarming) things about this issue is that nobody seems to know exactly why the chemical propellant is burning too quickly.  The problem appears to be related to humidity and vehicles in highly humid regions seem to be at higher risk, but not all experts agree with this assessment.  Takata has admitted to production issues at a plant in Washington state and says that the some of the chemicals used in the air bags were left out and exposed to humidity causing them to react too quickly, but there hasn’t been evidence released that directly ties these manufacturing issues to the defective air bags.  There is concern that the design itself may be the problem and that it’s a much larger issue than a manufacturing defect impacting a relatively small number of air bags.

The handling of the issue has also been problematic.  There is evidence that both Honda and Takata knew about a death possibly tied to air bags as early as 2004, but decided it was an anomaly.  Some believe that the companies were slow to react as more deaths and injuries associated with malfunctioning air bags occurred.   7.8 million vehicles with Takata air bags have been recalled, most of them in humid regions. Takata has been resistant to expanding the recall of the air bags beyond high humidity regions and has been threatened with fines by federal regulators.

The bottom line is that it’s difficult to know how to handle a problem when you don’t know exactly what the problem is.  As Representative John Sarbanes said “If you don’t know the root cause, how do you know that the replacement part that you’re supplying solves the problem?”  Some automakers, such as Honda, have made deals with alternative air-bag suppliers for substitute parts to use during the recall because of the unresolved issues with the Takata air bags.  The recall process will likely take some time because the spike in demand for new air bags is going to severely tax manufacturers available to supply them.

To see a high level Cause Map of this issue, click on “Download PDF” above.  You can also click here to see what vehicles have been recalled so far.

Chocolate Makers Warn of Possible Shortage

By Kim Smiley

Chocolate is one of the most beloved foods, but it may be becoming a little too popular.  Major chocolate makers have warned of a possible chocolate shortage looming in the near future.  According to a recent article by the Washington Post, “The world’s biggest chocolate-maker says we’re running out of chocolate”, the world consumed about 70,000 metric tons more cocoa last year than it produced.  The chocolate deficit is also predicted to get worst.

The chocolate shortage is a classic example of supply and demand in action.  The demand for cocoa is rising at the same time that the supply is dropping.  The price consumers are paying for chocolate is already increasing and is likely to get significantly higher if these trends continue.

So why is demand increasing (beyond the obvious fact that chocolate is delicious)? Part of the answer is that it is trendy to include chocolate in a wider variety of foods such as savory gourmet dishes, liquor and breakfast cereal.  Even the already questionable potato chip has been covered in chocolate to the delight of many.  The increasing popularity of dark chocolate also comes into play because dark chocolate contains significantly more cocoa than typical chocolate. (An average chocolate bar is about 10% cocoa while dark chocolate bars are usually closer to 70%.)  The sheer number of people who are eating chocolate is also growing as chocolate is more widely available worldwide, particularly in Asia where chocolate consumption is increasing rapidly.

While demand continues to grow, supply is decreasing.  Drought in West Africa, where the majority of the world’s chocolate is grown, has impacted the cocoa supply.  The plants are also being attacked by diseases; the most noteworthy is a fungus called Frosty pod, which is reducing the crop further.  The nature of chocolate trees also makes responding to difficult or changing growing conditions challenging because it takes them years to mature.  With the difficulties facing chocolate trees, many farmers are turning to other crops that are more profitable which reduces the production of cocoa.

The end result of higher demand for chocolate will likely be further increases in the price of chocolate.  It’s also likely that chocolate makers will continue to develop candy that includes non-chocolate ingredients such as nuts, raisins or nougats to meet the demand for treats while using less actual chocolate.  Additionally, farmers are working to develop new strains of cocoa that are resistant to disease and drought and/or produce more cocoa per plant, which would increase the supply of cocoa.

A Cause Map, a visual root cause analysis, can be used to show the causes that have contributed to the chocolate deficit. To view a high level Cause Map of this example, click on “Download PDF” above.

Investigation Into the Fatal Crash of Commercial Space Vehicle is Underway

By Kim Smiley

On October 31, 2014, Virgin Galactic’s commercial space vehicle, SpaceShipTwo, tore apart over the Mojave Desert in California during its fourth rocket-powered test flight. One pilot was killed and the other seriously injured. An investigation is underway to determine exactly what caused the crash, but initial data indicates that the tail booms used to slow down the vehicle moved into the feathered position prematurely, increasing the aerodynamic force. This disaster has the potential to impact the emerging commercial space industry as regulators and potential passengers are reminded of the inherent dangers of space travel.

This issue can be analyzed by building a Cause Map, a visual method for performing a root cause analysis. An initial Cause Map can be built using the information that is currently available and then easily expanded as more data is known. The first step is to fill in an Outline with the basic background information of the incident. Additionally, the impacts to the overall goals are listed on the Outline to determine the scope of the issue. The Cause Map is then built by asking “why” questions.

Starting with the safety goal in this example: one pilot was killed and another was injured because a space vehicle was destroyed and they were onboard. (When two causes both contribute to an effect, they are both listed on the Cause Map and joined with an “and”.) SpaceShipTwo is designed to hold passengers, but this was a test flight to assess a new fuel so the pilots were the only people onboard. The space vehicle tore apart because the stress on the vehicle was greater than the strength of the vehicle. The final report on the accident will not be available for many months, but the initial findings indicate that the space vehicle experienced greater aerodynamic forces than expected.

The space vehicle used tail booms that were shifted into a feathered position to increase drag and reduce speed prior to landing. Video shows the co-pilot releasing the lever that unlocked the tail booms earlier than expected while the vehicle was still accelerating. It’s unclear at this time why he released the lever. The tail booms were not designed to move when unlocked and a second lever controls movement, but investigators speculate that the aerodynamic forces on the space vehicle while it was still accelerating caused them to lift up into the feathered position once they were unlocked. The vehicle disintegrated seconds after the tail booms shifted position, likely because of the aerodynamic forces in play.

After the final report is released, the Cause Map can be expanded to include the additional information. To view a high level Cause Map of this accident, click on “Download PDF” above.

Antares Cargo Rocket Explodes Seconds After Launch

By Kim Smiley

On October 28, 2014 an Antares cargo rocket bound for the International Space Station (ISS) catastrophically exploded seconds after launch.  The $200 million rocket was planned to be one of eight supply missions to the ISS that Orbital Sciences has a $1.9 billion contract to provide.  The investigation is still underway, but initial findings indicate that there may have been a problem with the engines, which were initially built in the 1960s and early 1970s by the Soviet space program.

Whenever NASA launches a rocket, it is observed by safety personnel with the ability to cause the rocket to self-destruct if it appears to be malfunctioning to minimize potential injuries and property damage. Reports by NASA have indicated that this flight-termination system was engaged shortly after liftoff in this case because the rocket malfunctioned shortly after takeoff.

Video of the launch and the subsequent explosion show the plume from one engine changing shape a second before the massive explosion.  The change in the plume has led to speculation that a turbopump failed shortly after liftoff and suggests that the engines were the source of the malfunction.  Investigators are currently reviewing the video of the launch, telemetry readings from the rocket, and studying the debris to learn as many details as possible about this failure.

The engines in question are NK-33 rocket engines that were initially built (not just designed, but actually manufactured) more than 4 decades ago. So how did engines from the Apollo era end up on a rocket decades later in 2014?  The one-word answer is money.

These engines were originally designed to support the Soviet space program which was disbanded in 1974.  For years, these engines were warehoused with no real purpose.  In 1990, these engines were sold to a company called Aerojet, reportedly for the bargain price of a cool million each.  The engines were refurbished and renamed Aerojet AJ-26s.  The cost of using these older engines was significantly less than developing a brand new rocket design.  In addition to being expensive, a new rocket design requires a significant time investment.  There are also limited alternatives available, partly due to NASA’s shrinking budget.

Orbital Sciences has announced that they will source a different engine and no longer use the AJ-26s, but it’s worth nothing that these rockets have been used successfully in recent years. They have launched Cygnus supply spacecraft three times without incident.

To view a high level Cause Map, a visual root cause analysis, of this incident, click on “Download PDF” above.

Software Error Causes 911 Outage

By Kim Smiley

On April 9, 2014, more than 6,000 calls to 911 went unanswered.  The problem was spread across seven states and went on for hours.  Calling 911 is one of those things that every child is taught and every person hopes they will never need to do –  and having emergency calls go unanswered has the potential to turn into a nightmare.

The Federal Communications Commission (FCC) investigated this 911 outage and has released a study detailing what went wrong on that day in April.  The short answer is that a software error led to the unanswered calls, but there is nearly always more to the story than a single “root cause”.  A Cause Map, an intuitive format for performing a root cause analysis, can be used to better understand this issue by visually laying out the causes (plural) that led to the outage.

There are three steps in the Cause Mapping process. The first is to define an issue by completing an Outline that documents the basic background information and how the problem impacts the overall goals.  Most incidents impact more than one goal and this issue is no exception, but for simplicity let’s focus on the safety goal.  The safety goal was impacted because there was the potential for deaths and injuries.  Once the Outline is completed (including the impacts to the goals), the Cause Map is built by asking “why” questions.

The second step of the Cause Mapping process is to analyze the problem by building the Cause Map.  Starting with the impacted safety goal – “why” was there the potential for deaths and injuries?  This occurred because more than 6,000 911 calls were not answered.   An automated system was designed to answer the calls and it wouldn’t accept new calls for hours.  There was a bug in the automated system’s software AND the issue wasn’t identified for a significant period of time.  The error occurred because the software used a counter with a pre-set limit to assign calls a tracking number.  The counter hit the limit and couldn’t assign a tracking number so it quit accepting new calls.

The delay in identification of the problem is also important to identify in the investigation because the problem would have been much less severe if it had been found and corrected more quickly.  Any 911 outage is a problem, but one that lasts 30 minutes is less alarming than one that plays out over 8hours.  In this example, the system identified the issue and issued alerts, but categorized them as “low level” so they were never flagged for human review.

The final step in the Cause Mapping process is to develop and implement solutions to reduce the risk of the problem recurring.  In order to fix the issues with the software, the pre-set limit on the timer has been increased and will periodically be checked to ensure that the max isn’t hit again.  Additionally, to help improve how quickly a problem is identified, an alert has been added to notify operators when the number of successful calls falls below a certain percentage.

New issues will likely continue to crop up as emergency systems move toward internet-powered infrastructure, but hopefully the systems will become more robust as lessons are learned and solutions are implemented.  I imagine there aren’t many experiences more frightening than frantically calling 911 for help and having no one answer.

To view a high level Cause Map of this issue, including a completed Outline, click on “Download PDF” above.

Lawsuit Questions the Safety of Guardrails

By Kim Smiley

A whistleblower lawsuit claims that tens of thousands of guardrails installed across the US may be unsafe.  The concern is that the specific design of the guardrail in question, the ET-Plus, can jam when hit and puncture cars, potentially causing injury, rather than curling away as intended.

This issue has more questions than answers at this point, but an initial Cause Map can be built to document what is currently known.  A question mark should be added to any cause that is suspected, but has not been proven with evidence.  As more information, both new causes and evidence, becomes available the Cause Map can easily be expanded to incorporate it.

In this example, the primary concern, both from a safety and regulation standpoint, about the guardrails are centered on a design change made in 2005.  The size of the energy-absorbing end terminal was changed from five inches to four.  The modification was apparently made as a cost-saving measure.   The lawsuit alleges that federal authorities were never alerted to the design change so it never received the required review and approval.  It appears that federal authorities were not alerted until a patent case bought up the issue in 2012.

The reduction in the size of the end terminals may have affected how the guardrails function during auto accidents.  The lawsuit claims that five deaths and other injuries from at least 14 auto accidents can be attributed to the new design of guardrails.  The Federal Highway Administration has stated that the guardrails meet crash-test criteria, but three states (Missouri, Nevada and Massachusetts) are taking the concerns seriously enough to ban further installation of the guardrails pending completion of the investigation.

This issue is a classic proverbial can of worms.  Up to a billion dollars could be at stake in the lawsuit and the man who filed the lawsuit could get a significant cut of the payout.  There are potential testing requirement issues that need to be considered if the guardrails are passing crash tests, but causing injuries.  There are concerns over whether the company properly informed the federal government about design changes, which is a particularly sensitive topic following the recent GM ignition switch issues.  All and all, this should be a very interesting topic to follow as it plays out.

To view a high level Cause Map of this issue, click on “Download PDF” above.

Fire at FAA Facility Sparks Flight Havoc

By Kim Smiley 

On Friday September 26, 2014, air traffic was grounded for hours in the Chicago region following a fire in a Federal Aviation Administration facility in Aurora, Illinois. The snarl of flight issues impacted thousands of travelers in the days following the fire as airports struggled to deal with the aftermath of more than 4,000 canceled flights and thousands more delayed.

A Cause Map, a format for performing a visual root cause analysis, can be used to analyze this issue.  To build a Cause Map, the first step is to define the problem by determining how the overall organizational goals are impacted.  In this example, there is a significant customer service impact because thousands of passengers had their travel plans disrupted. The flight cancelations and delays can be considered an impact to the production/schedule goal.  The amount of time and energy needed to address the flight disruptions along with the investigation into the issue would also be impacts to the labor goal.  Once the impacts to the goals are determined, the Cause Map is built by asking “why” questions and visually laying out the answers to show the cause-and-effect relationship.

Thousands of flights were canceled because air traffic control was unable to support them.  Air traffic control couldn’t perform their usual function because there was a fire in a building that provided air traffic support for a large portion of the upper Midwest and it wasn’t possible to quickly provide air traffic support from another location. Focusing on the fire itself first, the fire appears to have been intentionally set by a contractor who worked in the building.  He was able to bring in flammable materials and start a fire without anyone stopping him.  Police are still investigating his motives, but he has been charged with a felony. The building was evacuated once the fire was discovered and employees obviously couldn’t perform their usual duties during that time.  Additionally, the fire damaged equipment so air traffic control functionality could not be quickly restored once the initial crisis was addressed and it was safe to return to the building.

The second portion of the issue is that there wasn’t a way to support air traffic once the building was evacuated.  Once the fire occurred, all flights were grounded because there wasn’t air traffic control support and it was not possible to quickly get air traffic moving again.

The final step in the Cause Mapping process is to develop and implement solutions to reduce the risk of a similar problem.  Law makers have called for an investigation into this issue to see if there is sufficient redundancy in the air traffic control system.  In an ideal situation, a fire or other crisis at any single location would not cripple US air traffic to the extent that this issue did.  The investigation is also looking into the fire and reviewing the security at the facility to see if there should be stricter restrictions put in place, such as ensuring that no employees work alone or searching bags as workers access the site.

This situation is also a strong reminder that organizations need to have a plan in place of what to do in case a failure occurs.  There was a previous fire scare at this same location earlier in 2014 when a smoking ceiling fan resulted in an evacuation and flight delays (see previous blog) that should have prompted some serious consideration of what the contingency plan should be if this facility was ever out of commission.

I was one of those people standing in line for hours at an airport on Friday morning after my flight was canceled.  And I for one would love to see the air traffic control system become more robust and better able to deal with the inevitable hiccups that occur.  It’s impossible to prevent every potential problem and another intentional fire in a FAA facility seems pretty farfetched, but it is possible to have a better plan in place to deal with issues that may arise.  The potential consequences of any single failure can be limited with a good plan and quick implementation of that plan.

Can Airline Seats Get Even Smaller?

By Kim Smiley

Was the experience the last time you flew wonderful?  Did you enjoy all the luxurious amenities like ample elbow room, stretching out your legs, and turning around in the bathroom?  Me neither.  Comfort certainly hasn’t been the top priority as airlines have shrunk seats to cram more passengers onboard, but a new patent application by Airbus really takes things to a whole new level.

They say that a picture is worth a thousand words and I think that is particularly true in this case.  This is a diagram of a patent application for a proposed seat design –

 

I’m not sure about the rest of you, but my backside is sore just thinking about an airplane seat that bears such a strong resemble to a bicycle.

I attempted to build a Cause Map, a visual root cause analysis, in order to better understand how such a design could be proposed because I frankly find it mind-boggling.  The basic idea is that airlines would like to maximize profits and that putting more people on each flight allows more tickets to be sold resulting in more money made.  The average airline seat width has already decreased to about 17 inches from the 18 inches typical for a long-haul airplane seat in the 1970s and 1980s.  Compounding the impact on passengers is the fact that the average passenger has increased during that same time frame.  In general larger bodies are being put in smaller seats, not a recipe for a comfort.

I’m still having a hard time understanding how the correct answer to increasing airline profits is making seats even smaller.  I have to believe that passengers will balk at some point.  At some level of discomfort, a cheap ticket just won’t be cheap enough for me to be willing to endure a truly awful flight.  Even with electronic distractions and snacks, there has to be a point where people would just say no.

There also has to be a number of safety concerns that arise when the size of airplane seats is dramatically decreased.  Survivability in a crash is greatly influenced by seat design because airplane seats are designed to absorb energy and provide head injury protection during an accident.

Just to be clear, there is no plan to actually use this seat design anytime in the near future.  This is just a patent application.  As Airbus spokeswoman, Mary Anne Greczyn said, “Many, if not most, of these concepts will never be developed, but in case the future of commercial aviation makes one of our patents relevant, our work is protected. Right now these patent filings are simply conceptual.” But somebody somewhere still thought this was a good enough idea that it should be patented…just in case.