Tag Archives: Investigation

Extensive Contingency Plans Prevent Loss of Pluto Mission

By ThinkReliability Staff

Beginning July 14, 2015, the New Horizons probe started sending photos of Pluto back to earth, much to the delight of the world (and social media).  The New Horizons probe was launched more than 9 years ago (on January 19, 2006) – so long ago that when it left, Pluto was still considered a planet. (It’s been downgraded to dwarf planet now.)  A mission that long isn’t without a few bumps in the road.  Most notably, just ten days before New Horizons’ Pluto flyby, mission control lost contact with the probe.

Loss of communication with the New Horizons probe while it was nearly 3 billion miles away could have resulted in the loss of the mission.  However, because of contingency and troubleshooting plans built in to the design of the probe and the mission, communication was able to be restored, and the New Horizons probe continued on to Pluto.

The potential loss of a mission is a near miss. Analyzing near misses can provide important information and improvements for future issues and response.  In this case, the mission goal is impacted by the potential loss of the mission (near miss).  The labor and time goal are impacted by the time for response and repair.  Because of the distance between mission control on earth and the probe on its way to Pluto, the time required for troubleshooting was considerable owing mainly to the delay in communications that had to travel nearly 3 billion miles (a 9-hour round trip).

The potential loss of the mission was caused by the loss of communication between mission control and the probe.  Details on the error have not been released, but its description as a “hard to detect” error implies that it wasn’t noticed in testing prior to launch.  Because the particular command sequence that led to the loss of communication was not being repeated in the mission, once communication was restored there was no concern for a repeat of this issue.

Not all causes are negative.  In this case, the “loss of mission” became a “potential loss of mission” because communication with the probe was able to be restored.  This is due to the contingency and troubleshooting plans built in to the design of the mission.  After the error, the probe automatically switched to a backup computer, per contingency design.  Once communication was restored, the spacecraft automatically transmits data back to mission control to aid in troubleshooting.

Of the mission, Alice Bowman, the Missions Operation Manager says, “There’s nothing we could do but trust we’d prepared it well to set off on its journey on its own.”  Clearly, they did.

Rollercoaster Crash Under Investigation

By ThinkReliability Staff

A day at a resort/ theme park ended in horror on June 2, 2015 when a carriage filled with passengers on the Smiler rollercoaster crashed into an empty car in front of it. The 16 people in the carriage were injured, 5 seriously (including limb amputations). While the incident is still under investigation by the Health and Safety Executive (HSE), information that is known can be collected in cause-and-effect relationships within a Cause Map, or visual root cause analysis.

The analysis begins with determining the impact to the goals. Clearly the most important goal affected in this case is the safety goal, impacted because of the 16 injuries. In addition to the safety impacts, customer service was impacted because of the passengers who were stranded for hours in the air at a 45 degree angle. The HSE investigation and expected lawsuits are an impact to the regulatory goal. The park was closed completely for 6 days, at an estimated cost of ?3 M. (The involved rollercoaster and others with similar safety concerns remain closed.) The damage to the rollercoaster and the response, rescue and investigation are impacts to the property and labor goals, respectively.

The Cause Map is built by laying out the cause-and-effect relationships starting with one of the impacted goals. In this case, the safety goal was impacted because of the 16 injuries. 16 passengers were injured due to the force on the carriage in which they were riding. The force was due to the speed of the carriage (estimated at 50 mph) when it collided with an empty carriage. According to a former park employee, the collision resulted from both a procedural and mechanical failure.

The passenger-filled carriage should not have been released while an empty car was still on the tracks, making a test run. It’s unclear what specifically went wrong to allow the release, but that information will surely be addressed in the HSE investigation and procedural improvements going forward. There is also believed to have been a mechanical failure. The former park employee stated, “Technically, it should be absolutely impossible for two cars to enter the same block, which is down to sensors run by a computer.” If this is correct, then it is clear that there was a failure with the sensors that allowed the cars to collide. This will also be a part of the investigation and potential improvements.

After the cause-and-effect relationships have been developed as far as possible (in this case, there is much information still to be added as the investigation continues), it’s important to ensure that all the impacted goals are included on the Cause Map. In this case, the passengers were stranded in the air because the carriage was stuck on the track due to the force upon it (as described above) and also due to the time required for rescue. According to data that has so far been released, it was 38 minutes before paramedics arrived on-scene, and even longer for fire crews to arrive with the necessary equipment to begin a rescue made very difficult by the design of the rollercoaster (the world record holder for most loops: 14). The park staff did not contact outside emergency services until 16 minutes after the accident – an inexcusably long time given the gravity of the incident. The delayed emergency response will surely be another area addressed by the investigation and continuing improvements.

Although the investigation is ongoing, the owners of the park are already making improvements, not only to the Smiler but to all its rollercoasters. In a statement released June 5, the owner group said “Today we are enhancing our safety standards by issuing an additional set of safety protocols and procedures that will reinforce the safe operation of our multi-car rollercoasters. These are effective immediately.” The Smiler and similar rollercoasters remain closed while these corrective actions are implemented.

Dr. Tony Cox, a former Health and Safety Executive (HSE) advisory committee chairman, hopes the improvements don’t stop there and issues a call to action for all rollercoaster operators. “If you haven’t had the accident yourself, you want all that information and you’re going to make sure you’ve dealt with it . . . They can just call HSE and say, ‘Is there anything we need to know?’ and HSE will . . . make sure the whole industry knows. That’s part of their role. It’s unthinkable that they wouldn’t do that.”

To view the information available thus far in a Cause Map, please click “Download PDF” above.

Live anthrax mistakenly shipped to as many as 24 labs

By Kim Smiley

The Pentagon recently announced that live anthrax samples were mistakenly shipped to as many as 24 laboratories in 11 different states and two foreign countries.  The anthrax samples were intended to be inert, but testing found that at least some of the samples still contained live anthrax.  There have been no reports of illness, but more than two dozen have been treated for potential exposure.  Work has been disrupted at many labs during the investigation as testing and cleaning is performed to ensure that no unaccounted-for live anthrax remains.

The investigation is still ongoing, but the issues with anthrax samples appear to have been occurring for at least a year without being identified.  The fact that some of the samples containing live anthrax were transported via FedEx and other commercial shipping companies has heightened concern over possible implications for public safety.

Investigations are underway by both the Centers for Disease Control and the Defense Department to figure out exactly what went wrong and to determine the full scope of the problem. Initial statements by officials indicated that there may be problems with the procedure used to inactivate the anthrax.   Investigators so far have indicated that the work procedure was followed, but it may not have effectively killed 100 percent of the anthrax as intended.  Technicians believed that the samples were inert prior to shipping them out.

It may be tempting to call the issues with the work process used to inactivate the anthrax as the “root cause” of this problem, but in reality there is more than one single cause that contributed to this issue and more than one solution should be used to reduce the risk of future problems to acceptable levels.  Clearly, there is a problem if the procedure used to create inactive anthrax samples doesn’t kill all the bacteria present and that will need to be addressed, but there is also a problem if there aren’t appropriate checks and testing in place to identify that live anthrax remains in samples.  When dealing with potentially deadly consequences, a work process should be designed where a single failure cannot create a dangerous situation if possible.  An effective test for live anthrax prior to shipping the sample would have contained the problem to a single facility designed to handle live anthrax and drastically reduced the impact of the issue.  Additionally, an another layer of protection could be added by requiring that a facility receiving anthrax samples test them upon receipt and handle them with additional precautions until they were determined to be fully inert.

Building in additional testing does add time and cost to a work process, but sometimes it is worth it to identify small problems before they become much larger problems.  If issues with the process used to create inert anthrax samples were identified the first time it failed to kill all the anthrax, it could have been dealt with long before it was headline news and people were unknowingly exposed to live anthrax. Testing both before shipping and after receipt of samples may be overkill in this case, but something more than just working to fix the process for creating inert sample needs to be done because inadvertently shipping live anthrax for more than a year indicates that issues are not being identified in a timely manner.

6/4/2015 Update: It was announced that anthrax samples that are suspected of inadvertently containing live anthrax were sent to 51 facilities in 17 states, DC and 3 foreign countries (Australia, Canada and South Korea). Ten samples in 9 states have tested positive for live anthrax and the number is expected to grow as more testing is completed. 31 people have been preventative treated for exposure to anthrax, but there are still no reports of illness. Click here to read more.

Deadly Train Derailment Near Philadelphia

By Kim Smiley

On the evening of May 12, 2015, an Amtrak train derailed near Philadelphia, killing 8 and injuring more than 200.  The investigation is still ongoing with significant information about the accident still unknown, but changes are already being implemented to help reduce the risk of future rail accidents and improve investigations.

Data collected from the train’s onboard event recorder shows that the train sped up in the moments before the accident until it was traveling 106 mph in a 50 mph zone where the train track curved.  The excessive speed clearly played a role in the accident, but there has been little information released about why the train was traveling so fast going into a curve.  The engineer controlling the train suffered a head injury during the accident and has stated that he has no recollection of the accident. The engineer was familiar with the route and appears to have had all required training and qualifications.

As a result of this accident and the difficulty determining exactly what happened, Amtrak has announced that cameras will be installed inside locomotives to record the actions of engineers.  While the cameras may not directly reduce the risk of future accidents, the recorded data will help future investigations be more accurate and timely.

The excessive speed at the time of the accident is also fueling the ongoing debate about how trains should be controlled and the implementation of positive train control (PTC) systems that can automatically reduce speed.  There was no PTC system in place at the curve in the northbound direction where the derailment occurred and experts have speculated that one would have prevented the accident. In 2008, Congress mandated nationwide installation and operation of positive train control systems by 2015.  Prior to the recent accident, the Association of America Railroads stated that more than 80 percent of the track covered by the mandate will not have functional PTC systems by the deadline. The installation of PTC systems requires a large commitment of funds and resources as well as communication bandwidth that has been difficult to secure in some area and some think the end of year deadline is unrealistic. Congress is currently considering two different bills that would address some of the issues.  The recent deadly crash is sure to be front and center in their debates.

In response to the recent accident, the Federal Railroad Administration ordered Amtrak to submit plans for PTC systems at all curves where the speed limit is 20 mph less than the track leading to the curve for the main Northeast Corridor (running between Washington, D.C. and Boston).  Only time will tell how quickly positive train control systems will be implemented on the Northeast Corridor as well as the rest of the nation, and the debate on the best course of action will not be a simple one.

An initial Cause Map, a visual root cause analysis, can be created to capture the information that is known at this time.  Additional information can easily be incorporated into the Cause Map as it becomes available.  To view a high level initial Cause Map of this accident, click on “Download PDF”.

New Regulations Aim to Reduce Railroad Crude Oil Spills

By ThinkReliability Staff

The tragic train derailment in Lac-Mégantic, Quebec on July 6, 2013 (see our previous blog on this topic) ushered in new concerns about the transport of crude oil by rail in the US and Canada. Unfortunately, the increased attention has highlighted a growing problem: spills of crude oil transported via rail, which can result in fires, explosions, evacuations, and potentially deaths. (Luckily there have been no fatalities since the Lac-Mégantic derailment.) According to Steve Curwood of Living on Earth, “With pipelines at capacity the boom has lead a 4,000 percent increase in the volume of crude oil that travels by rail, and that brought more accidents and more oil spills in 2014 than over the previous 38 years.”

This follows a period of increases in railroad safety – according to the US Congressional Research Service, “From 1980 to 2012, railroads reduced the number of accidents releasing hazmat product per 100,000 hazmat carloads from 14 to 1.” From October 19, 2013 to May 6, 2015, there were at least 12 railcar derailments that resulted in crude oil spills. (To see the list of events, click on “Download PDF” and go to the second page.)

Says Sarah Feinberg, acting administrator of the Federal Railroad Administration (FRA), “There will not be a silver bullet for solving this problem. This situation calls for an all-of-the-above approach – one that addresses the product itself, the tank car it is being carried in, and the way the train is being operated.” All of these potential risk-reducing solutions are addressed by the final rule released by the FRA on May 1, 2015. (On the same day, the Canadian Ministry of Transport released similar rules.) In order to view how the various requirements covered by the rule impact the risk to the public as a result of crude oil spills from railcars, we can diagram the cause-and-effect relationships that lead to the risk, and include the solutions directly over the cause they control. (To view the Cause Map, or visual root cause analysis, of crude oil train car derailments, click on “Download PDF”.)

The product: Bakken crude oil (as well as bitumen) can be more volatile than other types of crude oil and has been implicated in many of the recent oil fires and explosions. In addition to being more volatile, the composition (and thus volatility) can vary. If a material is not properly sampled and characterized, proper precautions may not be taken. The May 1 rule incorporates a more comprehensive sampling and testing program to ensure the properties of unrefined petroleum-based products are known and provided to the DOT upon request.   (Note that in the May 6, 2015 derailment and fire in Heimdahl, North Dakota, the oil had been treated to reduce its volatility, so this clearly isn’t an end-all answer.)

The tank car: Older tank cars (known as DOT-111s) were involved in the Lac-Mégantic and other 2013 crude oil fires. An upgrade to these cars, known as CPC-1232, hoped to reduce these accidents. However, CPC-1232 cars have been involved in all of the issues since 2013. According to Cynthia Quarterman, former director of the Pipeline and Hazardous Materials Safety Administration, says that the recent accidents involving the newer tank cars “confirm that the CPC-1232 just doesn’t cut it.”

The new FRA rule establishes requirements for any “high-hazard flammable train” (HHFT) transported over the US rail network. A HHFT is a train comprised of 20 or more loaded tank cars of a Class 3 flammable liquid (which includes crude oil and ethanol) in a continuous block or 35 or more loaded tank cars of a Class 3 flammable liquid across the entire train. Tank cars used in HHFTs constructed after October 1, 2015 are required to meet DOT-117 design criteria, and existing cars must be retrofitted based on a risk-based schedule.

The way the train is being operated: The way the train is being operated includes not only the mechanics of operating the train, but also the route the train takes and the notifications required along the way. Because the risk for injuries and fatalities increases as the population density increases, the rule includes requirements to perform an analysis to determine the best route for a train. Notification of affected jurisdictions is also required.

Trains carrying crude oil tend to be very large (sometimes exceeding one mile in length). This can impact stopping distance as well as increase the risk of derailment if sudden stopping is required. To reduce these risks, HHFTs are restricted to 50 mph in all areas, and 40 mph in certain circumstances based on risk (one of the criteria is urban vs. rural areas). HHFTs are also required to have in place a functioning two-way end of train or distributed power braking system. Advanced braking systems are required for trains including 70 or more loaded tank cars containing Class 3 flammable liquids and traveling at speeds greater than 30 mph, though this requirement will be phased in over decades.

It is important to note that this new rule does not address inspections of rails and tank cars. According to a study of derailments from 2001 to 2010, track problems were the most important causes of derailments (with broken rails or track welds accounting for 23% of total cars derailed). A final rule issued January 24, 2014 required railroads to achieve a specified track failure rate and to prioritize remedial action.

To view the May 1 rule regarding updates to crude-by-rail requirements, click here. To view the timeline of incidents and the Cause Map showing the cause-and-effect relationships leading to these incidents, click “Download PDF”.

Plane Narrowly Avoids Rolling into Bay

By ThinkReliability Staff

Passengers landing at LaGuardia airport in New York amidst a heavy snowfall on March 5, 2015, were stunned (and 23 suffered minor injuries) when their plane overran the runway and approached Flushing Bay.  The National Transportation Safety Board (NTSB) is currently investigating the accident to determine not only what went wrong in this particular case, but what standards can be implemented to reduce the risk of runway overruns in the future.

Says Steven Wallace, the former director of the FAA’s accident investigations office (2000-2008), “Runway overruns are the accident that never goes away.  There has been a huge emphasis on runway safety and different improvements, but landing too long and too fast can result in an overrun.”  Runway overruns are the most frequent type of accident (there are about 30 runway overruns due to wet or icy runways across the globe every year), and runway overruns are the primary cause of major damage to airliners.

Currently, the NTSB is collecting data (evidence) to aid in its investigation of the accident.  The plane is being physically examined, and the crew is being interviewed.  The data recorders on the flight are being downloaded and analyzed.  While little information is able to be verified or ruled out at this point, there is still value in organizing the questions related to the investigation in a logical way.

We can do this using the Cause Mapping method of root cause analysis, which organizes cause-and-effect relationships related to an incident.  We begin by capturing the impact to an organization’s goals.  In this case, 23 minor passenger injuries were reported, an impact to the safety goal.  There was a fuel leak of unknown quantity, which impacts the environmental goal.  Customer service was impacted due to a scary landing and evacuation from the aircraft via slides.  Air traffic at LaGuardia was shut down for 3 hours, impacting the production goal.  Both the airplane and the airport perimeter fence suffered major damage, which impacts the property/equipment goal.  The labor goal was also impacted due to the response and ongoing investigation.

By beginning with an impacted goal and asking “why” questions, we can begin to diagram the potential causes that may have resulted in an incident.  Potential causes are causes without evidence.  If evidence is obtained that supports a cause, it becomes a cause and it is no longer followed by a question mark.  If evidence rules out a cause, it can be crossed out but left on the Cause Map.  This reduces uncertainty as to whether a potential cause has been considered and ruled out, or not considered at all.

In this case, the NTSB will be looking into runway conditions, landing procedures, and the condition of the plane.   According to the airport, the runway was cleared within a few minutes of the plane landing, although the crew has said it appeared all white during landing.  The National Weather Service reported 7″ of snow in the New York area on the day of the overrun.  Procedures for closing runways or aborting landings are also being considered.  Just prior to the landing, other pilots who had recently landed reported braking conditions as good.

The crew has also reported that although the auto brakes were set to max, they did not feel any deceleration. The entire braking system will be investigated to determine if equipment failure was involved in the accident.  (Previous overruns have been due to brake system failures or the failure of reverse thrust from one of the engines, causing the plane to veer.)  The pilot also reported the automatic spoiler did not deploy, but they were deployed manually.

Also being investigated are the landing speed and position, though there is no evidence to suggest that there was any issue with crew performance.  As more information is released, it can be added to the investigation.  When the cause-and-effect relationships are better determined, the NTSB can begin looking at recommendations to reduce future runway overruns.

Early Problems with Mark 14 Torpedoes

By Kim Smiley

The problems with Mark 14 torpedoes at the start of World War II are a classic example that illustrates the important of robust testing.  The Mark 14 design included brand new, carefully guarded technology and was developed during a time of economic austerity following the Great Depression.  The desire to minimize costs and to protect the new exploder design led to such a limited test program that not a single live-fire test with a production model was done prior to deploying the Mark 14.

The Mark 14 torpedo design was a step change in torpedo technology. The new Mark VI exploder was a magnetic exploder designed to detonate under a ship where there was little to no armor and where the damage would be greatest.  The new exploder was tested using specially instrumented test torpedoes, but never a standard torpedo. Not particularly shocking given the lack of testing, the torpedoes routinely failed to function as designed once deployed.

The Mark 14 torpedoes tended to run too deep and often failed to detonate near the target. One of the problems was that the live torpedoes were heavier than the test torpedoes so they behaved differently. There were also issues with the torpedo’s depth sensor.  The pressure tap for the sensor was in the rear cone section where the measured pressure was substantially less than the hydrostatic pressure when the torpedo was traveling through the water.  This meant that the depth sensor read too shallow and resulted in the torpedo running at deeper depths than its set point.  Eventually the design of the torpedo was changed to move the depth sensor tap to the mid-body of the torpedo where the readings were more accurate.

The Mark 14 design also had issues with premature explosions.  The magnetic exploder was intended to explode near a ship without actually contacting it.  It used small changes in the magnetic field to identify the location of a target. The magnetic exploder had been designed and tested at higher latitudes and it wasn’t as accurate closer to the equator where the earth’s magnetic field is slightly different.

In desperation, many crews disabled the magnetic exploder on Mark 14 torpedoes even before official orders to do so came in July 1943.  Use of the traditional contact exploder revealed yet another design flaw in the Mark 14 torpedoes.  A significant number of torpedoes failed to explode even during a direct hit on a target.  The conventional contact exploder that was initially used on the Mark 14 torpedo had been designed for earlier, slower torpedoes.  The firing pin sometimes missed the exploder cap in the faster Mark 14 design.

The early technical issues of the Mark 14 torpedoes were eventually fixed and the torpedo went on to play a major role in World War II.  Mark 14 torpedoes were used by the US Navy for nearly 40 years despite the early issues.  But there is no doubt that it would have been far more effective and less painful to identify the technical issues during testing rather than in the field during war time.  There are times when thorough testing may seem too expensive and time consuming, but having to fix a problem later is generally much more difficult.  No one wants to waste effort on unnecessary tests, but a reasonable test program that verifies performance under realistic conditions is almost always worth the investment.

To view a high level Cause Map of the early issues of the Mark 14 torpedoes, click “Download PDF”.

You can also learn more about the torpedoes by clicking here and here.

Fatal Bridge Collapse Near Cincinnati

By Kim Smiley

On the evening of January 19, 2015, an overpass on Interstate 75 near Cincinnati collapsed, killing one and injuring another.  The overpass was undergoing construction when it unexpectedly collapsed onto the road below it, which was still open to traffic.

This incident can be analyzed by building a Cause Map, a visual root cause analysis, to intuitively lay out the many causes that contributed to an accident by showing the cause-and-effect relationships.  Understanding all the causes that played a role, as opposed to focusing on a single root cause, expands the potential solutions that can be considered and can lead to better problem prevention.  A Cause Map is built by asking “why” questions and documenting the answers.

In this example, a construction worker was operating an excavator on the overpass when it collapsed.  When the bridge collapsed the worker was crushed by the steel beams he was moving.   The additional weight of evacuator and steel beams on the overpass likely contributed to the collapse.   The overpass was being demolished as part of a project to remake this section of the Interstate and a portion of the overpass had already been removed.  The work that had been done appears to have made the structure of the bridge unstable, but the construction company was not aware of the potential danger so the worker was operating on top of the overpass and the road beneath it was still open to traffic.

A truck driver traveling under the overpass at the time of collapse suffered only minor injuries, but came within inches of being crushed by the bridge. It really was simple luck that no other vehicles were involved.  Had the collapse happened earlier in the day when there was more traffic, the number of fatalities may very well have been higher.  As investigators review this accident, one of the things they will need to review is the fact that the road below the bridge was open to traffic at the time of the collapse.  An additional relevant piece of information is that the construction company had financial incentives to keep the road open as much as possible because they would be fined for any amount of time that traffic was disrupted.

In addition to the safety impacts of this accident, the overpass collapse dramatically impacted traffic on a busy road with an estimated 200,000 vehicles traveling on it daily.  It took nearly a day to get all lanes of the interstate cleaned up and reopened to traffic.  No one wants to close roads unnecessarily and the goal of minimizing traffic is an excellent one, but it has to be balanced with safety.  The collapse of the overpass wasn’t an unforeseeable random accident and the demolition needs to be done in a safe manner.

Passengers trapped in smoke-filled metro train

By Kim Smiley

A standard commute quickly turned into a terrifying ordeal for passengers on a metro train in Washington, DC the afternoon of January 12, 2015.  Shortly after leaving a station, the train abruptly stopped and then quickly filled with thick smoke. One passenger died as a result of the incident and 84 more were treated for injuries, predominantly smoke inhalation.

This incident can be analyzed by building a Cause Map, a visual root cause analysis.  A Cause Map visually lays out the cause-and-effect relationships to show all the causes that contributed to an issue.  The first step in the Cause Mapping process is to define the problem by filling in an Outline with the basic background information as well as documenting how the issue impacts the overall goals.  For this example, the safety goal is clearly impacted by the passenger death and injuries.  A number of other goals should also be considered such as the schedule goal which was impacted by significant metro delays.  (To view an Outline and initial Cause Map for this issue, click on “Download PDF” above.)

So why were passengers injured and killed?  Passengers were trapped on the train and it filled with smoke.  It is unclear why the train wasn’t able to back up to the nearby station once the smoke formed and investigators are working to learn more.  (Open issues can be documented on the Cause Map with a question mark to indicate that more evidence is needed.)  There are also questions about the time emergency workers took to reach the train to aid in evacuation of passengers so this is another area that will require more information to fully understand. By some account, it took 40 minutes for firefighters to reach the trapped passengers.

Initial reports are that smoke was caused by an electrical arcing event, likely from the cables supporting the high voltage third rail used to power the trains. The specifics of what caused the arc are being investigated by the National Transportation Safety Board and will be released when the investigation is concluded.  What is known is that there was significant smoke caused by the arc, but no fire.  There have also been reports of water near the rails that may have been a factor in the arcing.

Eyewitness accounts of this incident are horrifying.  People had little information and didn’t know whether there was fire nearby at first.  They were told to remain on the train and await rescue, but the rescue took some time, which surely felt longer to the scared passengers.  It won’t be clear what solutions need to be implemented to prevent similar problems in the future until the investigation is complete, but I think we can agree that metro officials need to work to ensure passenger safety going forward.

Bad Weather Believed to Have Brought Down AirAsia Flight QZ8501

By ThinkReliability Staff

AirAsia flight QZ8501, and the 162 people on-board, was lost on December 28, 2014 while flying through high-altitude thunderstorms. Because of a delay in finding the plane and continuing bad weather in the area, the black box, which contains data that will give investigators more detail on why the plane went down, has not yet been recovered. Even without the black box’s data, experts believe that the terrible weather in the area was a likely cause of the crash.

“From our data it looks like the last location of the plane had very bad weather and it was the biggest factor in behind the crash. These icy conditions can stall the engines of the plane and freeze and damage the plane’s machinery,” says Edvin Aldrian, the head of Research at an Indonesian weather agency. Beyond the icing of engines, there are other theories on how weather-related issue may have brought down the plane.

Early speculation was that the plane was struck by lighting; while it may have been struck by lightning, experts say it’s unlikely it would have brought the plane down, because modern planes are fairly well-equipped to deal with direct lightning strikes. High levels of turbulence can also result in stalling due to a loss of airflow over the wings. There are also some who believe the plane (an Airbus A320) may have been pushed into a vertical climb past the limit for safe operation (to escape the weather) which resulted in a stall.

While the actual mechanism of how the weather (or an unrelated issue) brought the plane down is still to be determined, aviation safety organizations are already implementing some interventions to increase the safety of air travel in the area based on some specific areas of concern. (These areas of concern can be viewed visually in a Cause Map, or visual root cause analysis, by clicking on “Download PDF” above.)

AirAsia pilots relied on “self-briefings” regarding the weather. Pilots in other locations have expressed concern about the adequacy of weather information pilots obtain using this method. Direct pilot briefings with dispatchers based on detailed weather reporting are recommended to ensure that pilots have the information they need to safely traverse areas of poor weather (or stay out of them altogether).

Heavy air traffic in the area delayed approval to climb out of storm. At 6:12 local time the flight crew requested to climb to higher altitude to attempt to escape the storm. Air traffic control did not attempt to respond to the plane until 6:17, at which point it could no longer be contacted. Air traffic in the area was heavy, possibly because:

The plane did not have permission to fly the route it was on. AirAsia was licensed to fly the route it was taking at the time of the crash four days a week, but not the day of the crash. The takeoff airport used incorrect information in allowing the plane to take off in the first place (and the airline certainly used incorrect information in trying to fly the route as well). The selection of the route has been determined not to be a factor in the crash, but it certainly may have resulted in the overcrowding that led to the delayed response from air traffic control. It also resulted in the airline’s flights on that route being suspended.

It took almost three days to find the plane. The delay is renewing calls for universal tracking of aircraft or real-time streaming of flight data that were initially raised after the loss of Malaysia Airline flight MH370, which is still missing ten months after losing radar contact. (See our previous blog on the difficulties finding it.) Not only would this reduce the suffering of families while waiting to hear their loved ones’ fates, it would reduce resources required to find lost aircraft and, in cases where survival is possible, increase the chance of survival of those on the plane.