All posts by ThinkReliability Staff

ThinkReliability are specialists in applying root cause analysis to solve all types of problems. We investigate errors, defects, failures, losses, outages and incidents in a wide variety of industries. Our Cause Mapping analysis method of root causes, captures the complete investigation with the best solutions all in an easy to understand format. ThinkReliability provides investigation services and root cause analysis training to clients around the world and is considered the trusted authority on the subject

Don’t Just Google It . . . Maps Error Leads to Wrong House Being Demolished

By ThinkReliability Staff

Imagine coming “home” and finding an empty lot. That’s what happened in Rowlett, Texas on March 22, 2016. A tornado had previously damaged many of the homes in the area; some were slated for repairs, and some for demolition. The demolition company had plans to level the duplex at 7601 Cousteau Drive, but instead demolished the duplex at 7601 Calypso Drive.

An error on Google Maps has been blamed for the mistake but, as is typical with these types of incidents, there’s more to it than that. To ensure that all the causes leading to an incident are identified and addressed, it’s important to methodically analyze the issue. Creating a Cause Map, a form of root cause analysis that creates a map of cause-and-effect relationships is one way a problem can be analyzed.

The first step in the Cause Mapping process is to capture the what, when and where of an incident. Along with the geographic (where the incident occurred) and process location (what was being done at the time), it can be helpful to capture any differences about the situation surrounding the incident. In this case, “differences” would be anything out of the ordinary during the demolishing of the house at 7601 Cousteau/Calypso. The error on Google Maps (which pointed to the house which was mistakenly demolished) is one difference. Another difference is that the name of the street was not checked during the location confirmation. Other potential differences between this demolish job and other demolish jobs were that the same house number was present on both streets, in close proximity, and both houses experienced tornado damage. These differences may or may not be causally related – at this point, potential differences are just captured.

The next step is to capture the impacts to the organization’s goals as a result of the incident. These impacts to the goals become the first effects in the cause-and-effect relationships. In this case, there’s a potential for injuries (an impact to the safety goal) as a result of an unexpected demolition. The demolition of a house planned to be repaired is an impact to the environmental, customer service, and property goals. The demolition of the wrong house is an impact to the production/ schedule and labor/time goals.

The analysis begins with one of the impacted goals. Asking “why” questions develops cause-and-effect relationships. For example, the demolition of the wrong house was caused by the duplex at 7601 Calypso Drive being demolished while the duplex at 7601 Cousteau was planned for demolition. Because both of these facts (which can be verified with evidence) resulted in the wrong house being demolished, they are both connected to the cause of ‘demolition of wrong house” and joined with an “AND”.

Each cause on the map is also an effect. More detail can be added to the Cause Map by continuing to ask “why” questions. However, one cause may not be sufficient to result in an effect, so questions such as “what else was required?” are also necessary to ensure all causes are present on the map. In this case, the crew went to the wrong house because of an error on Google Maps, which was used to find the house. Per a Google spokeswoman, 7601 Cousteau was shown at the location of 7601 Calypso. This error has been identified as “the cause” of the incident. However, there were other opportunities to catch the error. Opportunities that were missed are also causes in the cause-and-effect relationship. While there was a site confirmation prior to demolition, only the street number (7601), lot location (corner lot), and tornado damage were confirmed. All three of these data points used to confirm the location were the same for 7601 Cousteau and 7601 Calypso.

What hasn’t been mentioned in the news but is apparent from looking at a (corrected) Google Map is that the house-numbering scheme of the neighborhood was set up for failure. 7601 Calypso is on the corner of Calypso Drive and Cousteau Drive, meaning a person could easily believe it was 7601 Cousteau. 7601 Cousteau is just a block away, on the corner of Cousteau Drive and an apparently unnamed alley. I can’t imagine it is the first time that someone has confused the two.

While it’s too late for 7601 Calypso Drive, Google Maps has fixed the error. Likely in the future this demolition company will use another identifier (or will mark the house while talking to the homeowners prior to the demolition) to ensure that the wrong house is not destroyed.

To view the Cause Map, as well as the updated Google Map, click on “download PDF” above.

Crane Collapse In High Winds Kills One in NYC

By ThinkReliability Staff

A crane collapsed in New York City on February 5, 2016 killing one, injuring three, and damaging two city blocks. While an investigation is underway and the causes of the crane collapse have not yet been determined, the city has already implemented new rules to make crane operations safer. We can examine the potential cause-and-effect relationships that led to the issue in a Cause Map, or visual root cause analysis.

We begin by capturing the what, when and where of the incident within a problem outline. The crane collapse occurred February 5 at about 8:30 a.m. Anything that is different or unusual at the time of an incident should also be noted on the outline and an important difference on February 5 was the accelerating winds. The crane that collapsed was a crawler crane, and at the time of the collapse, workers were in the process of securing the crane because of the high winds. This was as expected. Says New York City Mayor Bill de Blasio, “The workers on Friday morning did not begin work on the site, but immediately seeing the winds, made the move to secure the crane, so their timing was appropriate. Upon arrival, they immediately determined the need to secure the crane.”

The impact to the goals as a result of the incident are also captured in the problem outline. In this case, the safety goal was impacted due to the death, as well as injuries. The environmental goal was impacted by water leaks resulting from damage. Customer service (looking at the citizens of New York City as customers) is impacted due to closures. Production is impacted because 418 additional cranes were secured as a result of the incident. Property impacts includes damage to the crane, as well as two city blocks. The labor goal was impacted because of the time required for the response and removal of the damaged crane. It’s also important to capture the frequency of similar events. OSHA reports it has investigated 13 fatal crane accidents in the last 5 years. (There was a crane collapse in New York City in 2008 that resulted in 4 deaths. Click here to see our previous blog on this topic.)

Once the impacts to the goals have been captured, the analysis begins with one of these goals, which is an effect. Asking “why” questions allows the development of cause-and-effect relationships. In this case, the fatality and injuries resulted from the collapse of a crane. It also resulted from people being in the area of the crane collapse. Both of these causes are required (the fatality and injuries would not have occurred if the crane had not collapsed, or if people had not been in the area) so they are listed vertically and joined with “AND”.

People were in the area where the crane collapsed because the area was inadequately secured. This is likely because construction workers were responsible for securing the area, as well as securing the crane. The reasons for the crane collapse are unknown. However, the investigation will look at human error, structural and equipment problems, and impacts from high winds. While the cause has not been determined, it is considered likely that the wind played a role. The crane was not yet secured, as the workers were in the process of attempting to secure it. It was not required to be secured because city regulations limit operation of cranes when wind is above 30 miles per hour(mph), or if there are gusts greater than 40 mph. The crane operators were working under a limit of 25 mph, as sometimes manufacturers use stricter limits. The forecast did not indicate that winds would be greater than 25 mph that day.

As a result of the incident, Mayor de Blasio put into place immediate and temporary rules regarding crane operation. These rules will be in place until a task force provides updated recommendations within 90 days. Uniformed personnel will assist with enforcing closures associated with crane use. Crane operations are limited to wind speeds less than 30 mph (or gusts up to 40 mph). A city sweep and increased fines were also put into place to ensure the updated regulations are followed.

To view a one-page overview of the Outline, Cause Map and interim solutions, click on “Download PDF”.

Avoiding Procedure Horrors in Your Little Shop

By ThinkReliability Staff

Are you singing “Suddenly Seymour”, yet?  In this blog, we take a look at the ever-so-interesting example of a Venus Flytrap.  These fascinating creatures have captured imaginations and inspired many science fiction books, movies and even a musical (Little Shop of Horrors).  When thinking about a Venus Flytrap, the “problem” really depends on the point of view   From the point of view of the fly, the problem is getting eaten for lunch.  From the point of view of the Venus Flytrap, the problem is how to catch its lunch.  Since it’s really only a problem for one of the parties, we will  focus on the question of how, and examine the Process Map as a best practice for documenting the how in your shop.

Process Maps are very useful tools.  Converting a written job procedure or word of mouth instructions into a picture or map can illuminate a complicated process and make it seem quite simple.  Asking how something happens, or how something gets done can provide valuable detail that can be useful for anyone attempting that task now and in the future.  The benefit can include preventing or minimizing incidents that often recur from lack of clarity in a procedure.

To start with, a very simple map can be created that shows the process of a Venus Flytrap eating a fly in 4 steps:  The fly lands in the trap, the trap closes, the plant eats the fly, and the trap opens again.  However, this ‘simple’ process is actually extremely complex.  In his recent article titled “Venus Flytraps Are Even Creepier Than We Thought” (The Atlantic, January 21 , 2016), Ed Yong outlines the process and intricacies of how the carnivorous plant works.  When the fly lands on the Flytrap’s bright red and enticing leaves, a complicated process of chemicals, electrical impulses and physics is kicked off… all with very delicate timing.  The Flytrap’s leaves are covered with sensitive hairs.  If the fly touches those hairs more than once in 20 seconds, it begins a process ensuring its own demise.  A well-timed increase in calcium ions and electrical impulses result in water flowing to the Flytrap’s leaves, causing them to change shape, trapping the fly inside.  At this point, the more the fly struggles, the more problems it creates for itself.  Further stimulating these hairs results in more calcium ions and more electrical impulses, this time resulting in the flow of hormones and digestive enzymes.  Over time, the leaves will create a hermetic seal and fill up with liquid, causing the fly to asphyxiate and die.  Next, the pH level of the fluid inside the trap drops to 2, and the digestive process begins in earnest.  Recent research suggests that chemical sensors on the Flytrap’s leaves can detect the level of digestion of the fly, stimulating the release of more digestive enzymes if needed, or causing the trap leaves to open back up.  The Flytrap is then ready to begin the process again.  As Charles Darwin said, “THIS plant, commonly called Venus’ fly-trap, from the rapidity and force of its movements, is one of the most wonderful in the world.”  (1875. Insectivorous Plants)

This Process Map, while detailed, could surely be broken down into further detail by a entomologist who deeply understands the intricate workings of a Venus Flytrap.  Fortunately for a baby Venus Flytrap, this process map is coded directly into its DNA, so it doesn’t have to rely on anything to know what to do.  Unfortunately for us, work-related tasks are rarely so instinctual.  We rely on job procedures, process maps and word of mouth to learn the best, safest way to get the job done. Ensuring consistency with that transfer of information is key to making sure that incidents and problems are avoided.  Problems that result from poorly defined procedures or work processes can go by many names: procedure not followed, human error, etc.  At the end of the day, the roots (pun intended) of many of these problems are poorly articulated or poorly communicated work processes.  The simple tool of a process map can help minimize these problems by making the steps of the process clear and easy to understand.

Is Having a Lockout/ Tagout (LOTO) Procedure Enough?

By Staff

The number of possible types of injuries occurring when performing work on energized equipment is impossible to count.  They can range from burns, to electrical shock, to crush injuries, to cuts/lacerations, and beyond.  In an effort to help eliminate some of these injuries, the OSHA standard for Control of Hazardous Energy (29 CFR 1910.147), more commonly known as lockout/tagout (LOTO), went into effect in 1989.  The purpose of the standard is to help companies establish the practices and procedures needed to prevent injury to workers when they are performing maintenance activities to equipment requiring an energy source.  Any company in violation of the standard is subject to a fine.  It is estimated that in 2013, there were approximately $14 million in federal and state fines, and lockout/tagout was the 5th most frequently violated standard in 2015.

However, the REAL goal of the standard is to keep people safe.  So how is the standard violated?  It can happen in many ways, but this blog takes a look at one specific incident to better understand  how it can happen.  This analysis is based on a case study presented in the article “Lockout/Tagout Accident Investigation” from the August 2014 issue of Occupational Health & Safety.

In this incident, several contractors were working on a project involving a particular switchgear.  Many of these contractors had performed lockout/tagout for the switchgear box related to the projects that they were working on.  After the work began, a worker from a different contractor was asked to clean out part of the switchgear.  Unfortunately, an arc flash occurred when he reached in the switchgear, resulting in burns to his hand and a blow-out injury to his knee.  Fortunately, the employee survived, recovered, and was able to return to his normal life.

A Cause Map can be built to analyze this issue.  The first step in Cause Mapping is to determine how the incident impacted the overall goals.  For this incident, the safety goal was the most obviously impacted goal due to the injuries that the worker sustained.  The goal is always for employees to leave the workplace in the same health in which they arrived.  Additionally, the regulatory goal was impacted since the injuries were severe enough that they were classified as recordable.

The Cause Map is a visual representation of the cause-and-effect relationships that contributed to the incident.  Starting with the impacted safety goal, ‘why’ questions can be asked to identify the key factors that caused the problem.  In this case, the injuries were caused by the fact that an arc flash occurred when the worker reached into the switchgear and he was not wearing personal protective equipment.  The worker was probably not wearing PPE because he thought that the switchgear was de-energized, and this was an effect of the fact that there were locks and tags already on the switchgear.  The arc flash was a result of the fact that the circuit breaker was energized when the worker reached in to clean it.  The circuit breaker was energized because of three factors: a different contractor had put it back in service the night before, the circuit was not tested by the worker, and the worker didn’t do his own lockout procedure.  Each of these problems can be further analyzed to reveal problems with communication, adding the task at the last minute and not including every task in a job safety analysis.

For this situation, and many like it, eliminating a cause anywhere on the map could have minimized the risk of the incident occurring.  For example, had the worker taken the time to put on protective equipment or test the circuit breaker, he might not have been injured.  Similarly, had the other contractors taken the time to update their locks/tags and ensure that they had communicated that the circuit had been reenergized to all interested parties, the worker might not have been injured.  This example demonstrates that having a lockout/tagout procedure is the first step in avoiding injuries.  Ensuring that the procedure is followed in combination with other safety standards is also important to minimize the risk of injury.

Flammable Siding Fuels High Rise Hotel Fire

By ThinkReliability Staff

A fire on New Year’s in Dubai has raised concern with similar building materials across the world. Around 9:30 pm on December 31, 2015, a fire started at a 63-story hotel. The fire quickly spread along the outside of the building. There were no reported fatalities but at least 14 were injured.

Performing a thorough root cause analysis for one specific incident can develop solutions for similar incidents around the world. These types of fires are becoming increasingly common – there have been 8 in the last two decades in Dubai alone. Similar fires have occurred in China, Azerbaijan, and Australia over recent years. We can investigate the causes the led to the New Year’s fire in Dubai by using the Cause Mapping method, a visual form of root cause analysis.

Our analysis begins by capturing the what, when and where of an incident as well as the impacts to the organization’s goals. In this case, the safety goal is impacted due to the injuries. The environmental goal is impacted due to the significant amount of smoke released, and the customer service goal is impacted because of the evacuation. Additional goals impacted include the property damage to the hotel and the labor/time associated with response and repairs.

Beginning with the impact to the safety goal, we can ask “why” questions to capture the cause-and-effect relationships that led to the injuries. In this case, the injuries resulted from an extensive fire that spread up the side of the hotel. An extensive fire requires both initiation and spread. Both the initiation and spread result from heat, fuel and oxygen. The oxygen in both cases was provided by the atmosphere. The heat source for the initiation is believed to be either from exposed wiring (per the local police chief and shown in photographs from before the incident), or a short circuit in a lamp (reported by some news sources). Because it has not been definitively determined, we put a “?” after each cause, and join them with “OR”. The fuel source for the initiation has been reported as curtains. Flammable liquid was also a potential cause but has been ruled out by the police chief.

A burning fire provides heat, so it will continue to burn as long as oxygen and fuel are present. The fuel that allowed the rapid spread of the fire is flammable cladding used as siding. This siding is made of two thin pieces of aluminum surrounding a foam core. Foam cores made primarily of polyethylene are highly flammable. This type of cladding is used because it is considered to provide a modern look, allows dust to be rinsed off during rains, and is relatively simple and cheap to install. While the foam core can be made of flame-resistant materials, this was not required for this building. After a similar fire in Dubai in 2012, new regulations banned the use of flammable material as cladding, but existing buildings (including this hotel) were not required to be retrofitted. The cladding was installed continuously, which allowed the fire to rapidly climb up the side of the building.

While electrical faults that can act as heat sources should be repaired as quickly as possible, the flammability of the materials used on high-rise buildings with multiple potential heat and fuel sources (and a nearly unlimited supply of oxygen) have raised significant concern, not only about this hotel or Dubai, but about buildings with similar cladding around the world. Says Peter Rau, the chief officer of Melbourne, Australia’s Metropolitan Fire Brigade (where a similar fire broke out in November 2014), “You know you’ve only got to step back a little bit further and say: ‘What does it mean for Australia and what does it mean (when) you’re talking to me from Dubai? This is a significant issue worldwide, I would suggest . . . There is no question this is a game changer.”

Landslide of construction debris buries town, kills dozens

By ThinkReliability Staff

Shenzhen, China has been growing fast. After a dump site closed in 2013, construction debris from the rapid expansion was being dumped everywhere. In an effort to contain the waste, a former rock quarry was converted to a dump site. Waste at the site reached 100 meters high, despite environmental assessments warning about the potential for erosion. On December 20, 2015, the worries of residents, construction workers and truckers came true when the debris slipped from the quarry, covering 380,000 square meters (or about 60 football fields) with thick soil as much as 4 stories high.

A Cause Map can be built to analyze this issue. One of the steps in the Cause Mapping process is to determine how the issue impacted the overall goals. In this case, the landslide severely impacted multiple goals. Primarily, the safety goal was impacted due to a significant number of deaths. 58 have been confirmed dead, and at least 25 are missing. The environmental goal and customer service goal were impacted due to the significant area covered by construction waste. The regulatory goal is impacted because 11 have been detained as part of an ongoing criminal investigation. The property goal is impacted by the 33 buildings that were destroyed. The labor goal is also impacted, as are more than 10,600 people participating in the rescue effort.

The Cause Map is built by visually laying out the cause-and-effect relationships that contributed to the landslide. Beginning with the impacted goals and asking “Why” questions develops the cause-and-effect relationships. The deaths and missing persons resulted from being buried in construction waste. Additionally, the confusion over the number of missing results from the many unregistered migrants in the rapidly growing area. The area was buried in construction waste when waste spread over a significant area, due to the landslide.

The landslide resulted from soil and debris that was piled 100 meters high, and unstable ground in a quarry. The quarry was repurposed as a waste dump in order to corral waste, which had previously been dumped anywhere after the closure of another dump. Waste and debris was piled so high because of the significant construction debris in the area. There was heavy construction in the area because of the rapid growth, resulting in a lot of debris. Incentives (dumpsite operators make money on each load dumped) encourage a high amount of waste dumping. Illegal dumping also adds to the total.

While an environmental impact report warned of potential erosion, and the workers and truck drivers at the dump registered concerns about the volume of waste, these warnings weren’t heeded. Experts point to multiple recent industrial accidents in China (such as the warehouse fire/ explosion in Tianjin in August, the subject of a previous blog) as evidence of the generally lax enforcement of regulations. Heavy rains contributed to ground instability, as did the height of the debris, and the use of the site as a quarry prior to being a waste dump.

Actions taken in other cities in similar circumstances include charging more for dumping debris in an effort to encourage the reuse of materials and monitoring dump trucks with GPS to minimize illegal dumping. These actions weren’t implemented in Shenzhen prior to the landslide, but this accident may prompt their implementation in the future. Before any of that can happen, Shenzhen has a long way to go cleaning up the construction debris covering the city.

Celebrating with a bit of bubbly? Read this first . . .

By ThinkReliability Staff

What better day than New Year’s Eve to pop open a bottle of champagne (or its non-French sibling, sparkling wine)? Great thought, but turns out there’s a right way to open a bottle of bubbly, and “pop” has nothing to do with it.

Your initial thought may be who cares? What possible difference could it make how I open a bottle? Well, assuming your goal is to celebrate an enjoyable evening with friends, family, or maybe a date, using an improper opening procedure could impact the safety goal, by injuring yourself or others. It can also affect your reputation by failing to impress those with whom you’ve chosen to celebrate (as well as anyone else in the vicinity). The lost champagne is an impact to the property goal, and the potential for clean-up impacts the labor goal (and is clearly not what you want to be spending your New Year’s Eve doing).

A study claims that 900,000 injuries per year result from champagne. Injuries typically result from corks hitting faces, especially eyes. The pressure inside a bottle of champagne can be as high as 90 pounds per square inch, resulting in a cork traveling at speeds of up to 50 miles an hour. Injuries resulting from slips on spilled champagne also fall into this category.

Both spills and flying corks can be prevented by using a proper procedure to open a bottle of champagne. The preparation starts far before the party does. The first step is to ensure that the champagne is cooled properly. This is not only for taste, but also for safety. Another study found that cooling the bottle to 39 degrees F (4 degrees C) reduces the speed at which the cork leaves the bottle. (The cork travels only 3/4 of the speed of that from a room temperature, or 64 degrees F, bottle.)

Once you’re ready to serve the champagne, grab the bottle, glasses, and a kitchen towel. Check to see if there’s a tab on the foil covering the neck. If not, you’ll also need a knife. (One thing you won’t need? A corkscrew.) Remove the foil from the neck, by pulling the tab if one is present or by cutting with a knife, and then peeling it off. From this point until you start pouring, keep the bottle pointed at a 45 degree angle, and away from people, breakable objects, walls and ceilings. Untwist the wire tab, or key, and remove the wire cage, and hold your thumb over the cork. Cover the cork and neck of the bottle with the kitchen towel, and grab both the towel and cork with one hand. With the other hand, gently and slowly twist the bottle until the cork slides out. (This will be not with a pop, but more of a whimper.) Do not shake the bottle!

Hold champagne flutes at an angle and pour champagne in on the side to preserve the bubbles. Repeat as necessary. If you’ll need to leave the location at which you are drinking, please do it as a passenger, or wait until you’ve sobered up. For an average person, that means waiting about an hour for every 5 ounces of wine/ champagne consumed. (The drink size of other kinds of alcohol is defined differently, and your weight will impact the time it takes for alcohol to leave your system.)

If you or someone else forgets these rules and ends up getting hit in or near the eye with a champagne cork, take a trip to the ophthalmologist right away. (Because it’s New Year’s Eve, you may have to hit the emergency room first.) Says ophthalmologist Andrew Iwach, MD, “The good news is that as long as we can see these patients in a timely fashion, then there’s so many things we can do to help these patients preserve their vision.”

To view a visual diagram of the proper champagne-opening procedure, click on “Download PDF” above.

Newly Commissioned USS Milwaukee Breaks Down at Sea

By ThinkReliability Staff

On December 11, 2015, just 20 days after commissioning, the USS Milwaukee completely lost propulsion and had to be towed back to port. This obviously brought up major concerns about the reliability of the ship. Said Senator John McCain (R-Arizona), head of Senate’s Armed Services Committee, “Reporting of a complete loss of propulsion on USS Milwaukee (LCS 5) is deeply alarming, particularly given this ship was commissioned just 20 days ago. U.S. Navy ships are built with redundant systems to enable continued operation in the event of an engineering casualty, which makes this incident very concerning. I expect the Navy to conduct a thorough investigation into the root causes of this failure, hold individuals accountable as appropriate, and keep the Senate Armed Services Committee informed.”

While very little data has been released, we can begin an investigation with the information that is known. The first step of a problem investigation is to define the problem. The “what, when and where” are captured in a problem outline, along with the impacts to the organization’s goals. In this case, the mission goal is impacted due to the complete loss of propulsion of the ship. The schedule/production goal is impacted by the time the ship will spend in the shipyard receiving repairs. (The magnitude and cost of the repairs has not yet been determined.) The property/equipment goal is impacted because metal filings were found throughout both the port and starboard engine systems. Lastly, the labor and time goal is impacted by the need for an investigation and repairs.

The next step of a problem investigation is the analysis. We will perform a visual root cause analysis, or Cause Map. The Cause Map begins with an impacted goal and asking “why” questions to diagram the cause-and-effect relationships that led to the incident. In this case the complete loss of propulsion was caused by the loss of use of the port shaft AND the loss of use of the starboard shaft. The ship has two separate propulsion systems, so in order for the ship to completely lose propulsion, the use of both shafts had to be lost. Because both causes were required, they are joined with an “AND”.

We continue the analysis by continuing to ask “why” questions of each branch. The loss of use of the port shaft occurred when it was locked as a precaution because of an alarm (the exact nature of the alarm was not released). Metal filings were found in the lube oil filter by engineers, though the cause is not known. We will end this line of questioning with a “?” for now, but determining how the metal filings got into the propulsion system will be a primary focus of the investigation. The loss of use of the starboard shaft occurred due to lost lube oil pressure in the combining gear. Metal filings were also found in the starboard lube oil filter. Again, it’s not clear how they got there, but it will be important to determine how the lube oil system of a basically brand new ship was able to obtain a level of contamination that necessitated full system shutdown.

While metal filings in the lube oil system is not a class-wide issue, it’s not the first time this class of ship has had problems. The USS Independence and USS Freedom, the first two ships of the class, suffered galvanic corrosion which caused a crack in the Freedom’s hull. The Freedom also suffered issues with its ship service diesel engines, a corroded cable, and a faulty air compressor.

Once all the causes of the breakdown are determined, engineers will have to determine solutions that will allow the ship to return to full capacity. Additionally, because of the number of problems with the class, the investigation will need to take a good look at the class design and manufacturing practices to see if there are issues that could impact the rest of the class going forward.

To view a one-page downloadable PDF with the beginning investigation, including the problem outline, analysis, and timeline, click “Download PDF” above.

Component Failure & Crew Response, Not Weather, Brought Down AirAsia Flight QZ8501

By Staff

Immediately following the December 28, 2014 crash of AirAsia flight QZ8501, severe weather in the area was believed to have been the cause of the loss of control of the plane. (See our previous blog on the crash.) However, recovery of the “black box” and a subsequent investigation determined that it was a component failure and the crew’s response to the upset condition that resulted in the crash and that weather was not responsible. This is an example of the importance of gathering evidence to support conclusions within an investigation.

Says Richard Quest, CNN’s aviation correspondent, “It’s a series of technical failures, but it’s the pilot response that leads to the plane crashing.” Because, as in common in these investigations, there is a combination of causes that resulted in the crash, it can help to lay out the cause-and-effect relationships. We will do this in a Cause Map, a visual form of root cause analysis. The Cause Map is built by beginning with an impact to the goals, such as the safety goal, and asking why questions.

The 162 deaths (all on board) resulted from the plane’s rapid (20,000 feet per minute) plunge into the sea. According to the investigation, the crash resulted from an upset/ stall condition AND the crew’s inability to recover from that condition. Because both of these causes contributed to the crash, they are both connected to the effect (crash) and separated with an “AND”.

More detail can be added to each “leg” of the Cause Map by continuing to ask “why” questions. The prolonged stall/ upset condition resulted from the aircraft being pushed beyond its limits. (It climbed 5,400 feet in about 30 seconds.) This occurred because of manual handling and because of the failure of the rudder travel limiter system, which is designed to restrict rudder movement to a safe range. The system failed due to a loss of electrical continuity from a cracked solder joint on a circuit board. Although maintenance records showed 23 complaints with the system in the year prior to the crash, it was not repaired. A former pilot and member of the investigation team stated it was considered “minor damage” and was “not a concern”.

The plane was being manually controlled because the autopilot and autothrust were disengaged. These systems were disengaged when a circuit breaker was reset (removed and replaced) to attempt to reset the system after a computer system failure (indicated by four alarms that sounded in the cockpit). While this is sometimes done on the ground, it shouldn’t be done in the air because it disengages the autopilot and autothrust systems. However, the crew had inadequate upset recovery training. According to the manual from the manufacturer the aircraft is designed to prevent it from becoming upset and therefore training is not necessary. The decision to manually place the plane in a steep climb is believed to have been an attempt to get out of the poor weather. Just prior to the crash, the less experienced co-pilot was at the controls.

The lack of crew training on upset conditions is also believed to have caused the crash. In addition, for at least some time prior to the crash, the pilot and co-pilot were working against each other by pushing their control sticks in opposite directions. The pilot was heard on the voice recorder calling for them to “pull down”, although “pulling” is used to bring the plane up.

The only recommendation that has so far been released is for commercial pilots to undergo flight simulator training for this type of emergency situation. AirAsia has already done so. The company, as well as the aviation industry as a whole, will hopefully look at the conclusions of the investigation report with a very critical eye towards improving safety.

Are Your Vehicle’s Tires Safe?

By ThinkReliability Staff

Four vehicle accidents between February and May of 2014 took 12 lives and injured 42 more. While the specifics of the accidents varied, all four were due to tread separations on tires. Later that year the National Transportation Safety Board (NTSB) hosted a Passenger Vehicle Tire Safety Symposium to address areas of concern regarding passenger vehicle safety due to tire issues. A special investigation report, which was adopted October 27, 2015, provides a summary of the issues and industry-wide recommendations to improve passenger vehicle safety.

There are multiple issues causing safety concerns with tires, and multiple recommendations to mitigate these safety risks. When dealing with a complex issue such as this, it can help to visually diagram the cause-and-effect relationships. We can do this in a Cause Map, or visual root cause analysis. This analysis begins with an impact to the organization’s goals. According to the NTSB report, tire-related accidents cause more than 500 deaths and 19,000 injuries every year in the US. Customer service (customers being members of the public who purchase and/or use tires) is impacted due to a lack of understanding of tire safety. The regulatory goal is impacted due to a lack of tire registration, and the production goal is impacted due to a low recall completion rate. Lastly, the property goal is impacted due to tires that are improperly maintained.

Cause-and-effect relationships are developed by beginning with an impacted goal (in this case, the deaths and injuries) and asking “why” questions. In this case, the deaths and injuries are due to tire-related accidents, of which there are about 33,000 every year in the US. Tire-related accidents includes accidents that are due to tire issues (such as tread separation) caused by improper maintenance or an unrepaired manufacturing issue with a tire (specifically those resulting in a tire recall). While the NTSB is recommending the promotion of technology that may reduce the risk of tire-related accidents, they also made recommendations that can reduce the risk of these accidents in the near term.

From 2009-2013, there were 3.2 million tires recalled in 55 safety campaigns. However, 56% of recalled tires remain in use, because of very low recall work completion rates. In a typical tire recall, only about 20% of recalled tires are returned to the manufacturer. (In comparison, about 78% of recalled cars are repaired.)   Many tires aren’t registered, and if they aren’t, it’s difficult to reach owners when there are recalls. Independent dealers and distributors, which sell 92% of tires in the US, aren’t required to register tires. While it is possible for consumers to look up their own tires to determine if they’ve been recalled, it’s difficult. The full tire identification number may not be printed in an accessible location, and the National Highway Traffic Safety Administration (NHTSA) website for tire recalls was found to be confusing.

The NTSB has recommended that tire manufacturers include the full tire identification number on both the inboard and outboard side walls of each tire so it can be more easily found by consumers. The NTSB has also recommended that the NHTSA, with the cooperation of the tire industry and Congress, if necessary, improve its recall site to allow search by identification number or brand and model, and improve registration requirements and the recall process.

Regarding improper maintenance, the report found that 23% of tire-related crashes involved tire aging and that 50% of drivers use the wrong tire inflation pressure, 69% have an underinflated tire, 63% don’t rotate their tires, and 12% have at least one bald tire. The report found that consumers have an Inadequate understanding of tire aging and service life and recommends developing test and best practices related to tire aging, and developing better guidance for consumers related to tire aging, maintenance and service life.

The NTSB has issued its own Safety Alert for Drivers, which includes the following guidance:

– Register new tires with the manufacturer

– Check your tire pressure at least once a month

– Inflate your tires to the pressures indicated in your vehicle owner’s manual (not on the tire sidewall)

– When checking tire pressure, look for signs of damage

– Keep your spare tire properly inflated and check it monthly for problems

– Rotate, balance and align your tires in accordance with your vehicle owner’s manual

– If you hear an unusual sound coming from a tire, slow down and have your tires checked immediately

To view the Cause Map, including impacted goals and recommendations, click on “Download PDF” above. Or, click here to read the NTSB’s executive summary.