All posts by Angela Griffith

I lead comprehensive investigations by collecting and organizing all related information into a coherent record of the issue. Let me solve a problem for you!

Dreamliner fire: firefighter injured when battery explodes

By ThinkReliability Staff

On January 7, 20 13, smoke was discovered on a recently deplaned Boeing 787 Dreamliner. The recently released National Transportation Safety Board (NTSB) investigation found that an internal short circuit within a cell of the auxiliary power unit (APU) battery spread to adjacent cells and led to a thermal runway which released fire and smoke aboard the aircraft. A firefighter responding to the fire was injured when the battery exploded. Only 9 days later, an incident involving the main battery, which is the same model as that used for the APU, resulted in an emergency landing of another Boeing 787. As a result of these two incidents, the entire Dreamliner fleet was grounded for 3 months for the ensuing investigation and incorporation of modifications. (See our previous blog about the grounding.) Before the fleet was allowed to resume operations, certain protective modifications were required to be implemented.

The investigation determined that the internal short circuit, which provided the initial heat source for the fire within the battery cell, could not be definitively determined due to severe damage in the area, but was potentially related to defects discovered during the manufacturing process. (Defects that could result in this type of short circuit were found on similar components.) The investigation found issues within the manufacturing process and with the oversight of subcontractors by contractors, as opposed to the manufacturers themselves.

The high temperatures resulting from the battery fire allowed it to spread to adjacent cells. Localized high temperatures were found greater than allowable at times of maximum current discharge, such as the APU startup, which had recently occurred. The high temperatures were not detected by the monitoring system (the impact could have been minimized had the issue come to light sooner), because temperatures were not monitored at individual cells, but only on two cell bus bars.

The systems were not prepared to deal with a spreading fire as the design of the aircraft assumed that a short circuit internal to the cell would not propagate. The NTSB determined that the guidance provided to determine key assumptions was ineffective and that the validation of these assumptions had failed. Likely related to this assumption, the safety assessment and testing on the battery system was ineffective. The rate of occurrence of cell venting (the spreading of fire from cell to cell) was calculated by the manufacturer to be 1 in 10 million flight hours. The two occurrences that resulted in the grounding both involved cell venting and occurred while the 787 fleet had less than 52,000 flight hours.

Immediate actions that were required by the NTSB prior to a return to flight were to enclose the battery case, vent from the interior of the enclosure containing the battery to the exterior of the plane (keeping smoke out of the occupied spaces), and modify the battery to minimize the most severe effects from an internal short circuit. The NTSB also made multiple safety recommendations to the manufacturer, subcontractor and the Federal Aviation Administration (FAA).

One of these recommendations was to ensure that assumptions are validated. According to the NTSB report, “Validation of assumptions related to failure conditions that can impact safety is a critical step in the development and certification of an aircraft. The validation process must employ a level of rigor that is consistent with the potential hazard to the aircraft in case an assumption is incorrect.” This statement is true for any object that’s manufactured. Just replace the word “aircraft” with whatever is being manufactured, such as “car” or “pacemaker”. (See another disaster that resulted from not validated assumptions: the collapse of the I-35 Bridge.)

Click on “Download PDF” above to view a high level Cause Map of this issue.

Chemical Release Kills Four Workers at Texas Pesticide Plant

By ThinkReliability Staff

In the early morning hours of November 15, 2014, a release of methyl mercaptan resulted in the deaths of four employees at a plant in Texas that manufactures pesticides. The investigation into the source of the leak is still ongoing, though persistent maintenance problems had been reported in the plant, which was shut down five days prior to the incident.

Even though the investigation has not been completed, there are some lessons learned that can be applied to this facility, and other facilities that handle chemicals, immediately.

Even “safer” chemicals are dangerous when not treated properly. The chemical released – methyl mercaptan – is stored as a safer alternative to methyl isocyanate (which was the chemical released in the Bhopal disaster). Although it’s “safer” than its alternatives, it is still lethal at concentrations above 150 parts per million. The company has stated that 23,000 pounds were released – in a room where complaints were made about insufficient ventilation. The workers were unable to escape – likely because they were quickly incapacitated by the levels of methyl mercaptan and did not have the necessary equipment to get out. (Only two air masks and oxygen tanks were found in the area where the employees were.)

A fast response is necessary for employee safety. Records show that 911 was not called for an hour after the employees were trapped. (One of the victims called his wife an hour prior to indicate there was an issue and he was attempting rescue.) The emergency industrial response group, which is trained to provide response in these sort of situations, was never called by the plant. Medical personnel could not access the employees because they were not trained in protective gear. Firefighters who responded did not have enough air to travel through the entire facility and did not have enough information on the layout to know where to go. It’s unclear whether a quicker response could have saved lives.

Providing timely, accurate information is necessary for public safety. The best way to determine the impact on the public is to measure the concentration of released chemicals at the fenceline (known as fenceline monitoring). Air monitoring was not performed for more than four hours after the release. Companies are not required to provide fenceline monitoring, although an Environmental Protection Agency (EPA) rule requiring monitoring systems for refineries is under review. (This rule would not have impacted this plant as it produced pesticides.) Until that monitoring, the only information available to the public was information provided by the company (which did not release until days later the amount of chemical released.) In Texas, companies are required to disclose the presence of chemicals, but not the amount. A reverse 911 system was used to inform residents that an odor would be present, but did not discuss the risks.

What can you do? Ensure that all chemicals at your facility are known and stored carefully. Develop a response plan that ensures that your employees can get out safely, that responders can get in safely (and are apprised of risks they may face), and that the public has the necessary information to keep them safe. Make sure these plans are trained on and posted readily. Depending on the risk of public impact from your business, involving emergency responders and the public in your drills may be desired.

To see a high level Cause Map of this incident, click on “Download PDF” above.

Safety Concerns Raised by 5 Railroad Accidents in 11 Months

By ThinkReliability Staff

The National Transportation Safety Board investigates major railroad accidents in the United States. It was not only the severity (6 deaths and 126 injuries) but the frequency (5 accidents over 11 months) of recent accidents on a railroad that led to an “in-depth special investigation“. Part of the purpose of the special investigation was to “examine the common elements that were found in each”.

When an organization sees a recurring issue – in this case, multiple accidents requiring investigation from the same railroad, there may be value in not only investigating the incidents separately but also in a common analysis. A root cause analysis that addresses more than one incident is known as a Cumulative Cause Map, and it captures visually much of the same information in a Failure Modes and Effects Analysis, or FMEA.

The information from the individual investigations of each of these accidents can be combined into one analysis, including an outline addressing the problems and impacts to the goals from the incidents as a whole. In this case, the problems addressed include issues on the Metro-North railroad in New York and Connecticut from May 2013 to March 2014. The five incidents during that time period resulted in 4 customer deaths and 126 injuries, 2 employee deaths, and over $23.8 million in property damage.

The analysis of the individual accidents can be combined in a Cumulative Cause Map to intuitively show the cause-and-effect relationships. The customer deaths and injuries, and the property damage, resulted from train derailments and a collision. The train collision resulted from a derailment. In two of the cases, the derailment was due to track damage that had either been missed on inspection or had maintenance deferred. In the third derailment (discussed in a previous blog), the train took a curve at an excessive rate of speed due to fatigue of the engineer. Inadequate track inspections and maintenance, and deferred maintenance were highlighted as recurring safety issues to the railroad.

Both of the employee fatalities resulted from workers being struck by a train while performing track maintenance. In one case, the worker was outside the designated protected area due to an inadequate job safety briefing. In the other, a student removed the block while working unsupervised, allowing a train to travel into the protected area. The NTSB also identified inadequate safety oversight and roadway worker protection procedures as areas needing improvement. While the NTSB already released recommendations with each of the individual investigations, it plans to issue more based on the cumulative investigation addressing all five incidents. View an overview of all 5 incidents by clicking “Download PDF” above.

Years of Uncontrolled Leakage Lead to Fatal Mall Collapse

By ThinkReliability Staff

The problems that led to the collapse of a shopping mall’s parking structure were present over its thirty-plus year history says the Report of the Elliot Lake Commission of Inquiry. Multiple opportunities to fix the problem were missed, culminating in the deaths of two on June 23, 2012. Says the report, “Although it was rust that defeated the structure of the Algo Mall, the real story behind the collapse is one of human, not material failure.”

Yes, corrosion of a connection supporting the parking garage decreased its strength to 13% of its original capacity, meaning that on that fateful day, one car driving over it resulted in its fatal collapse. But the more important story is that of how the corrosion was allowed to increase unchecked, due to leakage that had been noted since the opening of the mall.

Multiple causes were discovered resulting in the fatal collapse. The report that addresses them and suggests improvement is more than 1,000 pages long. Though the detail in the report is outstanding, an overview of the information from the report can be diagrammed in a Cause Map, or visual root cause analysis, allowing a one-page overview that clearly shows the cause-and-effect relationships.

It’s important to begin with the impact to the goals. Doing so gives a starting point – and focus – to the cause-and-effect questioning. In this case, the safety goal was impacted due to the 2 fatalities and 19 injuries caused by the collapse. The mall experienced severe damage, and the rescue and response efforts were comprehensive and time-consuming. Additionally, an engineer was criminally charged due to negligence from issues with the mall’s structural integrity.

The fatalities, property damage, and rescue efforts all resulted from the catastrophic collapse of the mall’s rooftop parking structure. The collapse was caused by the sudden failure of a connector. Material failure results from stress on an object overcoming the strength of the object. In this case the stress on the object was a single vehicle driving over the connection in question (evidenced by a video of the collapse). The strength of the connection had been significantly reduced due to corrosion, caused by the continuous ingress of water and chlorides on the unprotected beam.

The leakage was found to stem from a faulty initial design of the waterproofing system from construction of the mall in 1979. Specifically, the architect’s suggestions regarding waterproofing were ignored due to cost and land availability concerns, and the waterproofing system was installed during suboptimal weather because of construction delays. After construction, the architect signed off on the design without inspecting the site, beginning the first in a long list of failings that would eventually cost two women their lives.

Over the years, there were multiple warnings (not the least the need to use buckets to collect leaking water on a fairly constant basis) that were never resolved. According to the report, the problem was never fully addressed with maintenance and repairs but rather pushed off with cheap, ineffective repairs or by selling the structure (as happened twice in its history). For the most part, the local government did not investigate complaints or enforce building standards, apparently unwilling to interfere with the operation of a large source of local revenue and employment

When the local government finally did get involved and issued an Order to Remedy in 2009, the building owner appeared to provide deliberately false information that suggested that repairs were underway, leading to a rescinding of the order later that year. After an anonymous complaint in late 2011, an engineer with a suspended license performed a visual-only inspection which had to be signed off by a licensed engineer. After it was signed, the engineer testified that he had changed the contents of the report at the request of the owner, leading to the criminal charges against him for negligence.

Although plenty of failings were discussed in the report, it states very clearly, “This Commission’s role is not to castigate or chastise; its only purpose in finding fault, if it must, is to seek to prevent recurrence. Criticism of prevailing practices serves only to suggest their improvement or, if necessary, elimination.” In the report, the Commission discusses multiple suggestions for improvement – specifically clarifying, enforcing, and providing public information regarding building standards. Hopefully, the lessons learned from this tragic accident will allow for implementation of these solutions to ensure that thirty years of negligence isn’t allowed to cause a fatal building collapse again.

Two Firefighters Killed by Rogue Welding

By ThinkReliability  Staff

On March 26, 2014, two firefighters were killed when trapped in a basement by a quickly spreading, very dangerous fire in Boston, Massachusetts. These firefighters appear to have been the first to succumb to injuries directly caused by fire while on the job in 2014. The company that was found responsible for starting the fire has been fined by OSHA for failure to follow safety procedures. Says Brenda Gordon, Occupational Safety and Health Administration (OSHA)’s director for Boston and southeastern Massachusetts, “This company’s failure to implement these required, common-sense safeguards put its own employees at risk and resulted in a needless, tragic fire.”

Every incident that results in a fatality should be carefully investigated. Investigations are used not only for liability and regulatory reasons, but also to develop solutions to reduce the risk of similar fatalities happening in the future. Investigating an incident such as this in a Cause Map, or visual root cause analysis, allows for better solutions by determining all the cause-and-effect relationships that led to the issue.

First it’s important to define how goals were impacted in order to define the scope of the problem. In this case, two firefighters were killed, which impacts the safety goal. In addition, the spread of the fire, damage of nearby buildings and associated civil lawsuits are also impacts to the goals. The OSHA fine of $58,000 for 10 violations of workplace safety regulations is an impact to the regulatory goal. The response to the fire, as well as the multiple investigations, are impacts to the labor/time goal.

Beginning with an impacted goal and asking “Why” questions develops cause-and-effect relationships that explain how the incident occurred. In this case, the firefighters perished when they were trapped by fire. The firefighters were in the basement of a residential building to rescue occupants from a fire, and the fire was so hot and dangerous that the firefighters could not exit, and other firefighters were unable to come to their rescue. Extremely windy conditions spread the fire caused by a welding spark that struck a nearby wood shed.   OSHA investigators note that the company performing the welding did not follow safety precautions (including having a fire watcher and moving welding away from flammable objects) that would have reduced the risk for fire. They cited the lack of an effective fire prevention/ protection program and a lack of training in workplace and fire safety. View the Cause Map by clicking “Download PDF” above.

Ideally the fine levied by OSHA will encourage the company involved to increase its methods of fire protection, not only to protect its own workers, but also to protect the public. In addition, the Boston Fire Department is conducting an internal review to improve firefighter safety. Says Steve MacDonald, spokesman, “What they’re doing is looking at policies and procedures. They’re reviewing everything, reviewing weather, radio communications, anything and everything having to do with the fire.”

On July 5th, another firefighter died after being trapped in a building while looking for occupants during a fire in Brooklyn, New York. On July 9th, a firefighter in Houston, Texas was killed of smoke inhalation inside a burning building. A firefighter died in a building collapse due to fire in New Carlisle, Indiana on August 5, 2014, making a total of 5 firefighters who have died as a direct result of smoke/fire injuries while on the call of duty so far in 2014. In 2013, a total of 30 firefighters were killed on the job, most as the result of the Yarnell Hill fire in Arizona.

Children Served Bleach from Reused Milk Jug

By ThinkReliability Staff

For morning snack on September 11, 2014, a substitute teacher’s aide was getting ready to pour water for snack on her first day on the job. Unfortunately, what she poured from a reused plastic milk container was actually a beach solution used for cleaning. The mistake was realized quickly, but not before 28 children and 2 adults ingested some of the bleach. Luckily the concentration was low enough that there were no injuries, although all who ingested the solution were seen at a local hospital.

The substitute teacher’s aide was fired and the school reopened the next day, though the New Jersey Department of Children and Families will be investigating. Clearly serving cleaning solution to children under your care is undesirable. However, firing the person most directly involved without fixing any of the issues that contributed to the mistake may leave an unacceptable risk for the issue to happen again. Although this appeared to be the first time anything like this happened on such a scale in a day care facility, the misuse of cleaning fluid due to confusing containers has happened before. Just this July a woman was given an epidural of cleaning fluid after containers were accidentally switched. (See our blog to learn more.)

Identifying the impacted goals and all the causes that led to those impacted goals allows for more solutions than just firing the person found to be most immediately responsible. The use of a Cause Map, a visual form of root cause analysis, diagrams all the cause-and-effect relationships in order to develop as many solutions as possible so the most effective among them can be implemented.

First the impacts to the goals are identified. The safety goal is impacted because of the potential for injury to the 28 children and 2 adults who drank the bleach solution. The bleach solution was stored in a food container, which can be considered an impact to the environmental goal. The customer service goal is impacted because the children and adults were served bleach solution. The day care worker being fired, and the ongoing investigation by the licensing agency, can both be considered impacts to the regulatory goal. Additionally, the treatment of all 30 who ingested the solution impacts the labor goal.

Beginning with one impacted goal, we ask “why” questions to determine cause-and-effect relationships. In this case, the safety goal impact of potential injury is due to the children and teachers drinking the bleach solution they were served. The bleach solution was served by the fired employee who was apparently unaware that the milk jug actually stored bleach solution. The executive director indicated that the jug was labeled, so this is apparently not an uncommon practice at the site. The question this raises is, why was an old milk jug used to store cleaning solution?

The American Association of Poison Control Centers (AAPCC) says: “DO NOT use food containers such as cups or bottles to store household and chemical products” and “Store food and household chemical products in separate areas. Mistaking one for the other could cause a serious poisoning.” Although the reused container was apparently labeled (though not clearly enough to avoid the mistake), it should never have been reused in the first place. As indicated by the AAPCC, reusing containers between food and cleaning supplies is just too big of a risk. It’s also worth noting that reusing a bottle that contained household chemicals for a different household chemical is another no-no: “Never mix household chemical products together. Mixing chemicals could cause a poisonous gas.” Don’t run the risk at your workplace or home. Don’t reuse food containers for cleaning products or mix cleaning products.   Fortunately the children at this day care center got off without lasting damage in this case.

Will Factory Explosion Lead to Increased Safety?

By ThinkReliability Staff

On August 2, 2014, 75 workers were killed and about 186 were injured by an explosion at an auto parts factory in Kunshan, China. This devastating event has raised questions about worker protection and oversight in China, as well as the responsibility for manufacturers using subcontractors in China to provide a safe workplace.

The explosion can be examined in a Cause Map, or visual root cause analysis, to look at the effects, causes, and potential solutions of the issue. A Cause Map visually diagrams the cause-and-effect relationships associated with an issue. The first step in the Cause Mapping process is to determine the impact on an organization’s goals. In this case, the goals will be looked at from the broader perspective of the country of China. The safety goal was impacted due to the large number of fatalities and injuries. The regulatory goal is impacted due to the five executives that were detained (though it’s unclear for what purpose they are being held). In the wake of the disaster, 268 factories in the surrounding area have been shut down (impacting the production goal) as part of a three-month round of inspections (an impact to the labor goal).   In addition, the property goal was impacted due to the damage to the factory, the full extent of which is still unknown.

The cause-and-effect relationships resulting in these impacts to the goals are developed by asking ‘Why’ questions. The fatalities, injuries and damage to the factory resulted from an explosion. Preliminary investigation shows that it was a metal dust explosion. Dust explosions require five components to occur (as described in the dust explosion pentagon). These components are: heat, fuel, oxygen, confinement and dispersion. Oxygen and confinement are present under normal conditions. The preliminary investigation has identified a spark as the heat source (a common potential heat source in industrial settings).

In the case of a dust explosion, the fuel source is a dust, which is distributed into the air, providing a high level of surface area allowing the fuel to become explosive (dispersion). The process being performed at the plant, which manufactures wheels for a car manufacturer, was electroplating/ polishing hubcaps. At the time, the workers were polishing hubcaps, a process that is known to create metal dust that can lead to dust explosions if safety regulations aren’t carefully followed. Specifically, safety regulations protecting against dust explosion involve cleaning and ventilation. The preliminary investigation found a shortage of equipment that is used to remove dust.

Unfortunately, that’s not too surprising. Industrial accidents kill tens of thousands of people a year in China, which has generally demonstrated a lack of regard for safety. Regulations involving dust are insufficient (and insufficiently enforced) by both the government and the manufacturing companies that subcontract work to Chinese firms (and generally outsource oversight to their contractors). Subcontractors who make small, low-value parts find themselves under heavy pressure to cut costs in a competitive market. According to Geoffrey Crothall of the China Labor Bulletin, “The explosion at the factory in Kunshan illustrates once again that although there are many laws and regulations outlining health and safety standards in the workplace those standards are not properly enforced by local authorities.”

In response to the explosion, China has detained executives from the company, and has closed 268 factories that have the potential for similar issues until they are given government permission to reopen. The government is conducting what is expected to be a three-month round of investigations of these factories and is expected to develop regulations that will better protect workers from explosive dust conditions.

The incident is drawing comparisons to the Triangle Shirtwaist Company fire in New York City which killed 146 workers in 1911. After the deadly fire, many protections were put into place that have increased workplace safety in the United States. It is hoped this tragedy will lead to a similar outcry that will force the government to act on increasing worker safety and produce lasting change.

To view the Outline and Cause Map based on the preliminary investigation, click on “Download PDF” above. Or, click here to learn more about dust explosions.

 

Loss of Flight 17 over Ukraine

By ThinkReliability Staff

On July 17, 2014, Malaysian Airlines flight 17 was shot down 33,000′ above Ukraine by a surface-to-air missile.   The issue can be looked at in a Cause Map, or visual root cause analysis. Clearly the primary impact to the goals in this case was the death of all 298 passengers and crew members on the plane. Next the Cause Map is built by developing the cause-and-effect relationships by asking “Why” questions.

While there are multiple issues that can be discussed related to why the missile was fired at the plane, the solutions that would result in missiles not being fired are outside the sphere of influence of most (if not all) of us. Focusing on the solutions that are within the sphere of influence of airlines, regulatory bodies, and even individual passengers allows the most effective use of time.

For this reason, we will focus on why the plane was in the area. The route that planes take is generally determined by wind, weather and congestion. There are also areas where airspace is restricted. At the time Flight 17 flew over Ukraine, the restricted airspace over the area ended at 32,000′. Just a week prior a military transport plane was shot down at 21,000′. However, the primary concern at the time was shoulder-fired missiles which generally have a range much less than 32,000′.

Beyond the political questions of what to do about an unprovoked attack on a commercial airline, airlines, their regulatory bodies, and even passengers are trying to determine how they can stay safe while flying near or through one of the 41 currently designated “kinetic conflicts” (essentially areas where people are shooting at each other, causing a potential risk to planes, though generally not those flying at typical levels of commercial airliners).

Regulatory bodies, including the International Civil Aviation Organization (ICAO, the air-safety arm of the United Nations), are now looking at “the respective roles of states, airlines and international organizations for assessing the risk of airspace affected by armed conflict.” Currently each government determines the risk and whether airspace should be restricted. Air-safety experts say Ukraine’s restrictions weren’t unusual. Says air-safety consultant John Cox, “There has never been an airliner shot down from a surface-to-air missile at this kind of altitude. The threat has always been a shoulder-fired missile from insurgents.”

Individual airlines are also considering what they can do to reduce their risk. Some airlines are even considering antimissile devices, which use laser beams to draw heat-seeking missiles away from the plane itself. However, these are only effective against shoulder-fired heat-seeking missiles, not the type of missile that brought down flight 17. While many countries use these types of protection for their military planes, only Israel has required their use on commercial airliners.

For individual passengers who are concerned about the route their plane may be taking, flight-tracking services will allow them to see the flight paths of the most recent flights. However, because of gaps in coverage, flight paths over certain areas (such as over North Korea) may not be accurate. Airlines are being pressured to release their typical flight paths.

Even with the attack on flight 17 and the loss of two other planes (TransAsia Airways 222 and Air Algerie flight 5017 crashed on July 23 and July 24th, respectively, both in remote areas in poor weather), industry experts assure passengers that flying is still safe and that crashes are declining worldwide. The aviation accident rate is 2.8 per one million departures, the lowest since ICAO started tracking numbers. So far in 2014 there have been 70 commercial-plane crashes compared to 81 for the comparable period last year. (There were a total of 90 commercial flight crashes in 2013, compared to 99 in 2012 and 118 in 2011.) According to director of safety at aviation consultancy Ascend, “Having three accidents together doesn’t tell you anything about safety. It’s about the long-term trend. Airline safety is improving, and it is generally improving faster than the industry is expanding.”

To view the outline, Cause Map, and solutions, please click on “Download PDF” above. Read about more aviation safety incidents:

Malaysian Airlines Flight 370

Air Traffic Control system confusion affects hundreds of flights

Smoke at FAA facility results in flight disruptions

Asiana flight 214

DELAY OF RECALL REPAIRS FIRES UP NHTSA

By ThinkReliability Staff

On June 18, 2013, the manufacturer of Jeep Grand Cherokee and Liberty sport-utility vehicles (SUVs) recalled 1.56 million vehicles due to a risk of fuel tank fires during rear-end collisions. At the time of the recall, the National Highway Traffic Safety Administration (NHTSA) linked 51 deaths to the fuel tank fires. Although a fix was accepted in January, parts won’t be available to owners until August.

The NHTSA is concerned about this delay. Says O. Kevin Vincent, NHTSA Chief Counsel, “For many owners, a recall remedy deferred by parts availability easily becomes a defect remedy denied. Moreover, additional delays in implementing this recall with inure to Chrysler’s benefit at the expense of vehicle owner safety.”

Even without full information, a Cause Map can begin to develop the cause-and-effect relationships that led to an issue. As more information is provided, more detail can be added to the Cause Map.

The analysis begins by determining the impacts to the organization’s goals. In this case, the safety goal is impacted by the 51 deaths that were determined to have resulted from gasoline fires as a result of the recall issue as well as 4 additional deaths that have occurred since the recall, according to the executive director of watchdog group Center for Auto Safety. The delay in the repairs for the recall issue can also be considered an impact to the customer service and production goals.

Beginning with one of the impacts to the goals, asking “why” questions builds the Cause Map, a visual root cause analysis. Beginning with the deaths that have occurred as a result of the recall issue since the recall took place, asking “why” questions helps determine that the deaths resulted from the issue at the heart of the recall (the increased risk for gasoline fires) and the delay in repairs from the recall. (Had the repairs been implemented more quickly, the number of deaths as a result of the issue may have been reduced.)

The increased risk of gasoline fires occurs from an increased risk of fuel tank rupture in the event of a rear-end collision because the fuel tank, in an unusual design, is located behind the rear-most axle, which provides inadequate protection. The fix for the recall issue is to add a trailer hitch, which provides an additional distance between another vehicle and the fuel tank in a rear-end collision (but it should be noted will protect only against “lower to medium-speed rear-end crashes”).

Although the addition of trailer hitches was recommended by the manufacturer at the time of the recall, a supplier was not selected until December. The manufacturer has stated that it was finding new suppliers to deal with the higher-than-normal demand for these parts. It’s also possible that the manufacturer was waiting for the NHTSA to approve the fix, which occurred in January. The NHTSA was doing additional testing to ensure that the fix would be effective. After the supplier was selected, it took nearly two months for a purchase order to be issued and five months for production to begin. The reasons for this part of the delay are unknown, and are expected to be provided to the NHTSA near-term.

The delay starting production is one thing; another concern is the amount of time it will take before enough parts are available. The supplier originally selected could manufacture 1,323 Liberty trailer hitches and 882 Grand Cherokee trailer hitches a day, meaning that if all 1.56 million vehicle owners participated in the recall, it would take 4.7 years to produce enough trailer hitches. Currently, legal requirements are only that manufacturers are required to make repairs in a “reasonable time”, although most manufacturers begin repairs within about 60 days of notifying the NHTSA. This case may force the NHTSA to define what a “reasonable time” actually is.

The latest update from Chrysler is that the trailer hitch supplier has increased production capacity and will be able to meet the demand by March of 2015. Chrysler also said that the NHTSA over-estimated the number of hitches required for the recall because the calculations didn’t account for vehicles that are no longer in use or those already equipped with hitches.

To view a timeline, Outline and Cause Map of this issue, please click “Download PDF” above. Or, click here to learn more.

 

.

Extensive Fire on USS George Washington Placed Crew at Risk

By ThinkReliability Staff

When fire broke out in 2008 on aircraft carrier USS George Washington in an unmanned space that was being used to improperly store flammable materials, it took more than 8 hours to find the source of, and extinguish, the fire. In the Navy’s investigation report, Admiral Robert F. Willard, commander of the US Pacific Fleet, stated “It is apparent from this extensive study that there were numerous processes and procedures related to fire prevention and readiness and training that were not properly functioning. The extent of damage could have been reduced had numerous longstanding firefighting and firefighting management deficiencies been corrected.”

The processes and procedures that were implicated in the investigation of the fire can be examined in a Cause Map, or a visual root cause analysis. This process begins by identifying the goals impacted. In this case, the primary goal impacted was the safety goal. Thirty-seven sailors were injured; one was seriously burned. There were no fatalities. In addition, the damage to the ship was estimated at $70 million and left the ship unusable for 3 months.

Beginning with the impacted safety goal, asking ‘Why’ questions allows us to develop the cause-and-effect relationships that led to those impacted goals. In this case, the injuries to sailors resulted from the extensive fire aboard ship. In addition, some of the affected sailors (including the sailor who was seriously burned) did not have adequate protective clothing. Specifically, liners worn underneath firefighting gear were not available in one repair locker because they were being laundered. Both the fire and the inadequate protective gear were causally related to the injuries so they are both included on the Cause Map and joined with ‘and’.

Asking additional ‘why’ questions adds more detail to the Cause Map. When investigating a fire, it’s important to include the factors that resulted in the initiation of the fire (heat, fuel and oxygen) as well as those that allowed the fire to spread. In this case, the ignition (or heat) source was believed to be a cigarette butt. On-scene evidence showed that smoking was occurring in the area, against regulation. The ship was found to have inadequate training regarding the smoking policy and inadequate control over the locations where smoking was occurring, because regular zone inspections were not being held.

The initial fuel source was determined to be refrigerant oil and other flammable materials improperly stored in an unmanned space where the fire began. The oil was not turned in as required by procedure over a concern about the difficulty of retrieving it. Because the oil was never entered into the inventory control system, the storage discrepancy was not noted. The unmanned space in which it was stored was not inspected. Unmanned spaces were not included in zone inspections and the area had not been designed as a tank or void to be identified in the void and tank inspection.

Once a fire breaks out, the speed in which the source is found and extinguished has the most impact on the safety of personnel. In this case, the source of the fire was not found for eight hours.   Not only did the fire begin in an unmanned area, the drawings showing the layout of the ship were inaccurate, because the ship was in the midst of alterations.

Developing the causes the resulted in the impacted goals allows for identification of all the processes and procedures that need to be re-examined to reduce risk of recurrence. In this case, the report identified multiple processes and procedures that were re-evaluated in the wake of the disaster, including those for hazardous material storage, training, inspection and firefighting.

To learn more, click here to read the Navy investigation report. To view a one-page overview of the Outline and Cause Map, please click on “Download PDF” above.