Tag Archives: safety

Cause-and-Effect: Alcohol Consumption

By ThinkReliability Staff

The human body is a pretty amazing thing. Many of the processes that take place in our body on a regular basis – keeping us breathing, walking and playing video games or skydiving (or both, though hopefully not at the same time) – have not yet been replicated. They’re that complex.

Which of course raises a lot of questions: why do our bodies work the way they do? It also leads to the subset of questions, when x happens, why does y happen? If your question is, when I drink, why do I feel so great, then so lousy, science has the answers for you . . . and yes, we can capture them in a Cause Map!

If your goal for your body is to feel well and behave pretty consistently, then drinking alcohol is going to impact those goals. First, drinking is going to result in a decrease in control of your behavior. The specifics of how this manifest are legion, but I am assured you probably have many examples. Your post-binge feelings are also going to be impacted: most likely your drinking is going to result in a hangover (generally awful feelings centered around your abdomen and head), dehydration and frequent urination. If your goal is not to eat everything in sight without any consideration about what it will do to your waistline, then your diet may also be impacted due to a desire for carbohydrates.

Beginning with one of these goals, we can ask our favorite question: Why? For example, our decrease in behavior control results from the hypothalamus, pituitary gland, and cerebellum being depressed. This decreases inhibitions, ability to think clearly and also releases a whole slew of hormones and dopamine. Additionally, alcohol impacts neurotransmitters which direct emotions, actions and motor skills, so the combination may make you think you can dance on a table . . . but really you can barely walk.

Now about the ill after-effects. That lovely hangover results from your digestive system attempting to detoxify your body from alcohol and the pounding headache caused by dehydration. When your digestive system works to remove alcohol, the byproduct is acetaldehyde and your body doesn’t like it at all. Most of the alcohol from your body is going to be flushed through your bladder. In order to speed its exit, your body redirects all the liquid it can to your bladder, leaving you dehydrated. (That’s also why you have to run the bathroom so many times after drinking.) The whole process of removing alcohol from your body takes energy. In order to direct as much energy towards alcohol removal as possible, your brain shuts down most of your other functions (which doesn’t help with the ability to function). To get that energy back, your body craves food – carbs in particular (grease optional).

With all these bad effects, you may wonder why people drink at all. Well, when you drink, the alcohol depresses some systems as discussed above, resulting in the release of a bunch of hormones and dopamine. These make us feel good (or even fabulous!). That’s why we keep drinking. (There’s also a whole bunch of social pressures which I’m not going to go into here.)

Giving up drinking altogether is difficult, and many people don’t want to. There are, however, ways to minimize the ill effects of drinking. Food in your stomach helps absorb some of the alcohol, so eating before you drink can help. The headache portion of the hangover can be minimized by drinking a lot of water (though that won’t help with the frequent urination issue). AND OF COURSE, drinking does a number on your fine motor control and general behavior, you should never, ever drink and drive or operate other heavy machinery.

To view the Cause Map of what happens when you drink, click on “Download PDF” above. The information used to create this blog is from:

The Science of Getting Drunk” and

Every Time You Get Drunk This Is What Happens To Your Body And Your Brain

Deadly balcony collapse in Berkeley

By Kim Smiley

A 21st birthday celebration quickly turned into a nightmare when a fifth-story apartment balcony collapsed in Berkeley, California on June 16, 2015, killing 6 and injuring 7.  The apartment building was less than 10 years old and there were no obvious signs to the untrained eye that the balcony was unsafe prior to the accident.

The balcony was a cantilevered design attached to the building on only one side by support beams.  A report by Berkeley’s Building and Safety Division stated that dry rot had deteriorated the support beams significantly, causing the balcony to catastrophically fail under the weight of 13 bodies.

Dry rot is decay caused by fungus and occurs when wood is exposed to water, especially in spaces that are not well-ventilated. The building in question was built in 2007 and the extensive damage to the support beam indicates that there were likely problems with the water-proofing done during construction of the balcony.  Initial speculation is that the wood was not caulked and sealed properly when the balcony was built, which allowed the wood to be exposed to moisture and led to significant dry rot. However, the initial report by the Building and Safety Division did not identify any construction code violations, which raises obvious questions about whether the codes are adequate as written.

As a short-term solution to address potential safety concerns, the other balconies in the building were inspected to identify if they were at risk of a similar collapse so they could be repaired. As a potential longer-term solution to help reduce the risk of future balcony collapses in Berkeley as a whole, officials proposed new inspection and construction rules this week.  Among other things, the proposed changes would require balconies to include better ventilation and require building owners to perform more frequent inspections.  Only time will tell if proposed code changes will be approved by the Berkeley City Council, but something should be changed to help ensure public safety.

Finding a reasonable long-term solution to this problem is needed because balconies and porches are susceptible to rot because they are naturally exposed to weather.  Deaths from balcony failures are not common, but there have been thousands of injuries.  Since 2003, only 29 deaths from collapsing balconies and porches have been reported in the United States (including this accident), but an estimated 6,500 people have been injured.

Click on “Download PDF” above to see a Cause Map, a visual format of root cause analysis, of this accident.  A Cause Map lays out all the causes that contributed to an issue to show the cause-and-effect relationships.

Rollercoaster Crash Under Investigation

By ThinkReliability Staff

A day at a resort/ theme park ended in horror on June 2, 2015 when a carriage filled with passengers on the Smiler rollercoaster crashed into an empty car in front of it. The 16 people in the carriage were injured, 5 seriously (including limb amputations). While the incident is still under investigation by the Health and Safety Executive (HSE), information that is known can be collected in cause-and-effect relationships within a Cause Map, or visual root cause analysis.

The analysis begins with determining the impact to the goals. Clearly the most important goal affected in this case is the safety goal, impacted because of the 16 injuries. In addition to the safety impacts, customer service was impacted because of the passengers who were stranded for hours in the air at a 45 degree angle. The HSE investigation and expected lawsuits are an impact to the regulatory goal. The park was closed completely for 6 days, at an estimated cost of ?3 M. (The involved rollercoaster and others with similar safety concerns remain closed.) The damage to the rollercoaster and the response, rescue and investigation are impacts to the property and labor goals, respectively.

The Cause Map is built by laying out the cause-and-effect relationships starting with one of the impacted goals. In this case, the safety goal was impacted because of the 16 injuries. 16 passengers were injured due to the force on the carriage in which they were riding. The force was due to the speed of the carriage (estimated at 50 mph) when it collided with an empty carriage. According to a former park employee, the collision resulted from both a procedural and mechanical failure.

The passenger-filled carriage should not have been released while an empty car was still on the tracks, making a test run. It’s unclear what specifically went wrong to allow the release, but that information will surely be addressed in the HSE investigation and procedural improvements going forward. There is also believed to have been a mechanical failure. The former park employee stated, “Technically, it should be absolutely impossible for two cars to enter the same block, which is down to sensors run by a computer.” If this is correct, then it is clear that there was a failure with the sensors that allowed the cars to collide. This will also be a part of the investigation and potential improvements.

After the cause-and-effect relationships have been developed as far as possible (in this case, there is much information still to be added as the investigation continues), it’s important to ensure that all the impacted goals are included on the Cause Map. In this case, the passengers were stranded in the air because the carriage was stuck on the track due to the force upon it (as described above) and also due to the time required for rescue. According to data that has so far been released, it was 38 minutes before paramedics arrived on-scene, and even longer for fire crews to arrive with the necessary equipment to begin a rescue made very difficult by the design of the rollercoaster (the world record holder for most loops: 14). The park staff did not contact outside emergency services until 16 minutes after the accident – an inexcusably long time given the gravity of the incident. The delayed emergency response will surely be another area addressed by the investigation and continuing improvements.

Although the investigation is ongoing, the owners of the park are already making improvements, not only to the Smiler but to all its rollercoasters. In a statement released June 5, the owner group said “Today we are enhancing our safety standards by issuing an additional set of safety protocols and procedures that will reinforce the safe operation of our multi-car rollercoasters. These are effective immediately.” The Smiler and similar rollercoasters remain closed while these corrective actions are implemented.

Dr. Tony Cox, a former Health and Safety Executive (HSE) advisory committee chairman, hopes the improvements don’t stop there and issues a call to action for all rollercoaster operators. “If you haven’t had the accident yourself, you want all that information and you’re going to make sure you’ve dealt with it . . . They can just call HSE and say, ‘Is there anything we need to know?’ and HSE will . . . make sure the whole industry knows. That’s part of their role. It’s unthinkable that they wouldn’t do that.”

To view the information available thus far in a Cause Map, please click “Download PDF” above.

Make safeguards an automatic step in the process

By Holly Maher

On the morning of May 13, 2015, a parent was following his normal morning routine on his way to work.  He dropped off his older daughter at school and then proceeded to the North Quincy MBTA (Massachusetts Bay Transportation Authority) station where he boarded a commuter train headed to work.  When he arrived, approximately 35 minutes later, he realized that he had forgotten to drop off his one-year-old daughter at her day care and had left her in his SUV in the North Quincy station parking lot.  The frantic father called 911 as he boarded a train returning to North Quincy.  Thankfully, the police and emergency responders were able to find and remove the infant from the vehicle.  The child showed no signs of medical distress as a result of being in the parked car for over 35 minutes.

Had this incident resulted in an actual injury or fatality, I am not sure I would have had the heart to write about it.  However, because the impact was only a potential injury or fatality, I think there is great value in understanding the details of what happened and specifically how can we learn from this incident.  Unfortunately, this is not an isolated incident.  According to kidsandcars.org, an average of 38 children die in hot cars annually.  About half of those children were accidentally left in the vehicle by a parent, grandparent or caretaker.  While some people want to talk about these incidents using the terms “negligence” or “irresponsibility”, in the cases identified as accidental it is clear the parents were not trying to forget their children.  They often describe going into “autopilot” mode and just forgetting.  How many of us can identify with that statement?

On the morning this incident happened, the parent was following his typical routine.  After dropping off his older child at school, he went into “autopilot” and went directly to the North Quincy MBTA station, parked and left the vehicle to board the train.  His one-year-old daughter was not visible to him at that point because she was in the back seat of the vehicle in a rear facing car seat, as required by law.  Airbags were originally introduced in the 1970s but became more commercially available in the early 1990s.  In 1998, all vehicles were required to have airbags in both the driver and passenger positions.  This safety improvement, which has surely reduced deaths related to vehicle accidents, had the unintended consequence of putting children in car seats in a less visible position to the parents.  The number of hot car deaths has significantly increased since the early 1990s.

On the morning of the incident the ambient conditions were relatively mild, about 59 degrees Fahrenheit.  However, the temperature in a vehicle can quickly exceed the ambient conditions due to what is called the greenhouse effect.  Even with the windows down, the temperature in a vehicle can rise quickly.  80% of that temperature rise occurs within the first 10 minutes.

When the parent arrived at his destination, approximately 35 minutes later, he realized he had forgotten the infant and reboarded a train to return to the North Quincy station.  Thankfully, the parent also called 911 which expedited the rescue of the infant.  The time in the vehicle would obviously have been longer had he not called 911.

One other interesting detail about this incident is that the parent reported that he normally had a “safeguard” procedure that he followed to make sure this didn’t happen, but he didn’t follow it on this particular day.  It is unknown what the safeguard was or why it wasn’t followed.  This certainly makes an interesting point: we don’t follow safeguards when we know something is going to happen, we follow safeguards in case something happens.  As I told my daughter (who didn’t want to wear her seatbelt on the way from school to home because it “wasn’t that far”), you wear your seat belt not because you know you are going to get into an accident, you wear it in case you get into an accident.

The solutions that have been identified for this incident have been taken directly from kidsandcars.org.  They promote and encourage a consistent process to manage this risk not when you know you are going to forget, but in case you forget.  Consider placing something you need (phone, shoe, briefcase, purse) in the rear floor board so that you are required to open the rear door of the vehicle.  Always open the rear door when leaving your vehicle; this is called the “Look before you Lock” campaign.  Consider keeping a stuffed animal in the car seat; when the car seat is occupied, place the stuffed animal in the front seat as a visual cue/reminder that the child is in the car.  Consider implementing a process where the day care or caretaker calls if your child does not show up when expected.  This will minimize the amount of time the child might be left in the car.

For more information about this topic, visit kidsandcars.org.

Live anthrax mistakenly shipped to as many as 24 labs

By Kim Smiley

The Pentagon recently announced that live anthrax samples were mistakenly shipped to as many as 24 laboratories in 11 different states and two foreign countries.  The anthrax samples were intended to be inert, but testing found that at least some of the samples still contained live anthrax.  There have been no reports of illness, but more than two dozen have been treated for potential exposure.  Work has been disrupted at many labs during the investigation as testing and cleaning is performed to ensure that no unaccounted-for live anthrax remains.

The investigation is still ongoing, but the issues with anthrax samples appear to have been occurring for at least a year without being identified.  The fact that some of the samples containing live anthrax were transported via FedEx and other commercial shipping companies has heightened concern over possible implications for public safety.

Investigations are underway by both the Centers for Disease Control and the Defense Department to figure out exactly what went wrong and to determine the full scope of the problem. Initial statements by officials indicated that there may be problems with the procedure used to inactivate the anthrax.   Investigators so far have indicated that the work procedure was followed, but it may not have effectively killed 100 percent of the anthrax as intended.  Technicians believed that the samples were inert prior to shipping them out.

It may be tempting to call the issues with the work process used to inactivate the anthrax as the “root cause” of this problem, but in reality there is more than one single cause that contributed to this issue and more than one solution should be used to reduce the risk of future problems to acceptable levels.  Clearly, there is a problem if the procedure used to create inactive anthrax samples doesn’t kill all the bacteria present and that will need to be addressed, but there is also a problem if there aren’t appropriate checks and testing in place to identify that live anthrax remains in samples.  When dealing with potentially deadly consequences, a work process should be designed where a single failure cannot create a dangerous situation if possible.  An effective test for live anthrax prior to shipping the sample would have contained the problem to a single facility designed to handle live anthrax and drastically reduced the impact of the issue.  Additionally, an another layer of protection could be added by requiring that a facility receiving anthrax samples test them upon receipt and handle them with additional precautions until they were determined to be fully inert.

Building in additional testing does add time and cost to a work process, but sometimes it is worth it to identify small problems before they become much larger problems.  If issues with the process used to create inert anthrax samples were identified the first time it failed to kill all the anthrax, it could have been dealt with long before it was headline news and people were unknowingly exposed to live anthrax. Testing both before shipping and after receipt of samples may be overkill in this case, but something more than just working to fix the process for creating inert sample needs to be done because inadvertently shipping live anthrax for more than a year indicates that issues are not being identified in a timely manner.

6/4/2015 Update: It was announced that anthrax samples that are suspected of inadvertently containing live anthrax were sent to 51 facilities in 17 states, DC and 3 foreign countries (Australia, Canada and South Korea). Ten samples in 9 states have tested positive for live anthrax and the number is expected to grow as more testing is completed. 31 people have been preventative treated for exposure to anthrax, but there are still no reports of illness. Click here to read more.

Concrete slab smashes truck killing 3

By Kim Smiley

On April 13, 2015, a large section of a concrete barrier fell from an overpass onto a truck in Bonney Lake, Washington. A couple and their baby were in the vehicle and were all killed instantly. Investigators are working to determine what caused this accident and to determine why the road under the overpass remained open to traffic while construction was being done on the overpass.

A Cause Map, a visual method of root causes analysis, can be built to help understand this accident. More information is still needed to understand the details of the accident, but an initial Cause Map can be created now to capture what is known and it can be easily expanded to include additional information as it becomes available. A Cause Map is created by asking “why” questions and visually laying out the answers to show the cause-and-effect relationships. (Click here to learn more about basics of Cause Mapping.)

In this accident, three people were killed because the vehicle they were riding in was smashed by a large slab of concrete. The vehicle was hit by the concrete slab because it was accidently dropped and the truck was under the overpass at the time it fell because the road was open to traffic. (When two causes are both needed to produce and effect, the causes are listed on vertically on the Cause Map and separated by and “and”.) The road would typically have been closed to traffic while heavy work was performed on the overpass, but the work plan for the construction project did not indicate that any heavy work would be performed on the day of the accident.   At some point the actual work schedule must have deviated from the planned schedule, but no change was made in plan for managing traffic resulting in traffic traveling under the overpass while potentially dangerous construction was performed.

Investigators are still working to understand exactly why the concrete slab fell, but early indication is that temporary metal bracing that was supporting the concrete may have failed due to buckling. The concrete barrier on the overpass were being cut into pieces at the time of the accident so that they could be removed as part of a $1.7 million construction project to improve pedestrian access which included adding sidewalks and lights.

Once the details of what causes this tragic accident are better understood, solutions can be developed and implemented that will help reduce the risk of something like this happening again. To view a high level Cause Map of this accident, click on “Download PDF” above.

You can also read a previous blog “Girder Fell on Car, Killing 3” to learn more about a similar accident that occurred in 2004.

Crash of Germanwings flight 95252 Leads to Questions

By ThinkReliability Staff

On March 24, 2015, Germanwings flight 9525 crashed into the French Alps, killing all 150 onboard. Evidence available thus far suggests the copilot deliberately locked the pilot out of the cockpit and intentionally crashed the plane. While evidence collection is ongoing, because of the magnitude of this catastrophe, solutions to prevent similar recurrences are already being discussed and, in some cases, implemented.

What is known about the crash can be captured in a Cause Map, or visual form of root cause analysis. Visually diagramming all the cause-and-effect relationships allows the potential for addressing all related causes, leading to a larger number of potential solutions. The analysis begins by capturing the impacted goals in the problem outline. In this case, the loss of 150 lives (everybody aboard the plane) is an impact to the safety goal and of primary concern in the investigation. Also impacted are the property goal due to the loss of the plane, and the recovery and investigation efforts (which are particularly difficult in this case due to the difficult-to-access location of the crash.)

Asking “Why” questions from the impacted goals develops cause-and-effect relationships. In this case, the deaths resulted from the crash of the plane into the mountains of the French Alps. So far, available information appears to support the theory that the copilot deliberately crashed the plane. Audio recordings of the pilot requesting re-entry into the cockpit, the normal breathing of the co-pilot, and the manual increase of speed of the descent while crash warnings sounded all suggest that the crash was deliberate. Questions have been raised about the co-pilot’s fitness for duty. Some have suggested increased psychological testing for pilots, but the agency Airlines for America says that the current system (at least in the US), is working: “All airlines can and do conduct fitness-for-duty testing on pilots if warranted. As evidenced by our safety record, the U.S. airline industry remains the largest and safest aviation system in the world as a result of the ongoing and strong collaboration among airlines, airline employees, manufacturers and government.”

Some think that technology is the answer. The flight voice recorder captured cockpit alarms indicating an impending crash. But these were simply ignored by the co-pilot. If flight guidance software was able to take over for an incapacitated pilot (or one who deliberately ignores these warnings, disasters like this one could be avoided. Former Department of Transportation Inspector General Mary Schiavo says, “This technology, I believe, would have saved the flight. Not only would it have saved this flight and the Germanwings passengers, it would also save lives in situations where it is not a suicidal, homicidal pilot. It has implications literally for safer flight across the industry.”

Others say cockpit procedures should be able to prevent an issue like this. According to aviation lawyers Brian Alexander & Justin Green, in a blog for CNN, “If Germanwings had implemented a procedure to require a second person in the cockpit at all times – a rule that many other airlines followed – he would not have been able to lock the pilot out.”

After 9/11, cockpit doors were reinforced to prevent any forced entry (according to the Federal Aviation Administration, they should be strong enough to withstand a grenade blast). The doors have 3 settings – unlock, normal, and lock. Under normal settings, the cockpit can be unlocked by crewmembers with a code after a delay. But under the lock setting (to be used, for example, to prevent hijackers who have obtained the crew code from entering the cockpit), no codes will allow access. (The lock setting has to be reset every 5 minutes.) Because of the possibility a rogue crewmember could lock out all other crewmembers, US airlines instituted the rule that there must always be two people in the cockpit. (Of course, if only a three-person crew is present, this can cause other issues, such as when a pilot became locked in the bathroom while the only other two flight crew members onboard were locked in the cockpit, nearly resulting in a terror alert. See our previous blog on this issue.)

James Hall, the former chairman of the National Transportation Safety Board, agrees. He says, “The flight deck is capable of accommodating three pilots and there shouldn’t ever be a situation where there is only one person in the cockpit.” In response, many airlines in Europe and Canada, including Germanwings’ parent company Lufthansa, have since instituted a rule requiring at least two people in the cockpit at all times.   Other changes to increase airline safety may be implemented after more details regarding the crash are discovered.

THE WOEFUL TALE OF JACK & JILL

By Jon Bernardi

There has been a disturbing rise of injuries once thought to have been eradicated. Several federal and state agencies are considering legislation to address the very dangerous injuries from the gathering of liquid di-hydrogen oxide from certain unprotected hills and wells. Once upon time became the last straw, when siblings Jack and Jill fetched the ill-fated pail. Not only were crowns injured, but various homeopathic remedies were implemented with little consequence except to other participants, notably Jill.

What caused this unfortunate turn of events?

That question can be answered by building a Cause Map, a visual root cause analysis.  In the Cause Mapping process, the first step is to fill in an Outline with the background information for an issue as well as how the problem impacts the goals.  In this example, the aforementioned fetching impacts quite a number of goals: Safety as crowns were broken; environmental, the spilled di-hydrogen oxide; regulatory, child corporal punishment and child labor laws; customer service, no di-hydrogen oxide available for multiple purposes; production, the delay of supper; and labor, the time needed for medical attention.

Fortunately no property was lost as the well-made bucket survived intact.  Once we have filled out the Outline, the next step is to ask “why” questions to find the different causes that contributed to the problem being analyzed.

So why were they going up a hill? This presents us with a number of potential paths of exploration as to why the well was at the top of a hill. Even without knowing a detailed answer we know that a potential solution would be to get them hooked up an established di-hydrogen oxide system as soon as possible!

Why was there no protection? Broken crowns are a serious affair. This combined with the potential for other injuries from the fractious “tumbling down” incident leaves us to wonder how the well could be constructed in such a manner.

These are areas for further exploration. Even with the unanswered questions we are still able to propose several solutions to ensure that child labor laws are not ignored, hills are properly protected, and home remedies are carefully considered.

To view an Outline and a high level Cause Map for this issue, click on “Download PDF” above.

Prison Bus Collides With Freight Train

By Kim Smiley

On the morning of January 14, 2015, a prison bus went off an overpass and collided with a moving freight train.  Ten were killed and five more injured.  Investigators believe the accident was weather-related.

This tragic accident can be analyzed by building a Cause Map, a visual root cause analysis.  A Cause Map visually lays out the cause-and-effect relationships to show all the causes (not just a single root cause) that contributed to an accident.  The first step in the Cause Mapping method is to determine how the incident impacted the overall organizational goals.  Typically, more than one goal needs to be considered.  Clearly the safety goal was impacted because of the deaths and injuries.  The property goal is impacted because of the damage to both the bus and train (two train cars carrying UPS packages were damaged).  The schedule goal is impacted because of the delays in the train schedule and the impact on vehicle traffic.

The Cause Map itself is built by starting at one of the impacted goals and asking “why” questions. So why were there fatalities and injuries?  This occurred because there were 15 people on a bus and the bus collided with a train.  The bus was traveling between two prison facilities and drove over an overpass.  While on the overpass, the bus hit a patch of ice and slid off the road, falling onto a moving freight train that was passing under the roadway.  No one onboard the train was injured and the train did not derail, but it was significantly damaged.  The bus was severely damaged.

The prisoners onboard the bus were not wearing seat belts, as is typical on many buses.  They were also handcuffed together, although it’s difficult to say how much this contributed to the injuries and fatalities.

Useful solutions to prevent these types of accidents can be tricky.  The prison system may want to review how they evaluate road conditions prior to transporting prisoners.  This accident occurred early in the morning and waiting until later in the day when temperatures had increased may have reduced the risk of a bus accident.  Transportation officials may also want to look at how roads, especially overpasses, are treated in freezing weather to see if additional efforts are warranted.

To view a high level Cause Map of this accident, click on “Download PDF” above.

You can also read our previous blogs to learn more about other train collisions:

Freight Trains Collide Head-on in Arkansas

Freight Train Carrying Crude Oil Explodes After Colliding with Another

“Ghost Train” Causes Head-on Collision in Chicago

Deadly Train Collision in Poland

Dreamliner fire: firefighter injured when battery explodes

By ThinkReliability Staff

On January 7, 20 13, smoke was discovered on a recently deplaned Boeing 787 Dreamliner. The recently released National Transportation Safety Board (NTSB) investigation found that an internal short circuit within a cell of the auxiliary power unit (APU) battery spread to adjacent cells and led to a thermal runway which released fire and smoke aboard the aircraft. A firefighter responding to the fire was injured when the battery exploded. Only 9 days later, an incident involving the main battery, which is the same model as that used for the APU, resulted in an emergency landing of another Boeing 787. As a result of these two incidents, the entire Dreamliner fleet was grounded for 3 months for the ensuing investigation and incorporation of modifications. (See our previous blog about the grounding.) Before the fleet was allowed to resume operations, certain protective modifications were required to be implemented.

The investigation determined that the internal short circuit, which provided the initial heat source for the fire within the battery cell, could not be definitively determined due to severe damage in the area, but was potentially related to defects discovered during the manufacturing process. (Defects that could result in this type of short circuit were found on similar components.) The investigation found issues within the manufacturing process and with the oversight of subcontractors by contractors, as opposed to the manufacturers themselves.

The high temperatures resulting from the battery fire allowed it to spread to adjacent cells. Localized high temperatures were found greater than allowable at times of maximum current discharge, such as the APU startup, which had recently occurred. The high temperatures were not detected by the monitoring system (the impact could have been minimized had the issue come to light sooner), because temperatures were not monitored at individual cells, but only on two cell bus bars.

The systems were not prepared to deal with a spreading fire as the design of the aircraft assumed that a short circuit internal to the cell would not propagate. The NTSB determined that the guidance provided to determine key assumptions was ineffective and that the validation of these assumptions had failed. Likely related to this assumption, the safety assessment and testing on the battery system was ineffective. The rate of occurrence of cell venting (the spreading of fire from cell to cell) was calculated by the manufacturer to be 1 in 10 million flight hours. The two occurrences that resulted in the grounding both involved cell venting and occurred while the 787 fleet had less than 52,000 flight hours.

Immediate actions that were required by the NTSB prior to a return to flight were to enclose the battery case, vent from the interior of the enclosure containing the battery to the exterior of the plane (keeping smoke out of the occupied spaces), and modify the battery to minimize the most severe effects from an internal short circuit. The NTSB also made multiple safety recommendations to the manufacturer, subcontractor and the Federal Aviation Administration (FAA).

One of these recommendations was to ensure that assumptions are validated. According to the NTSB report, “Validation of assumptions related to failure conditions that can impact safety is a critical step in the development and certification of an aircraft. The validation process must employ a level of rigor that is consistent with the potential hazard to the aircraft in case an assumption is incorrect.” This statement is true for any object that’s manufactured. Just replace the word “aircraft” with whatever is being manufactured, such as “car” or “pacemaker”. (See another disaster that resulted from not validated assumptions: the collapse of the I-35 Bridge.)

Click on “Download PDF” above to view a high level Cause Map of this issue.