Tag Archives: Aviation

On February 9, 2014, a Royal Air Force Voyager was transporting 189 passengers and a crew of 9 towards Afghanistan when the plane suddenly entered a steep dive. Many passengers were unrestrained and were injured by striking the ceiling or other objects. Other passengers were injured by flying objects or spills of hot liquid. More than 30 passengers and crew members reported injuries, all considered minor. The Military Aviation Authority’s final report contains details of the impacts from the dive, the causes of the dive, and recommendations that would reduce the possibility of a similar issue in the future.

These impacts, the cause-and-effect relationships that led to them, and the recommended solutions can be captured within a Cause Map. The Cause Map process begins with filling in a Problem Outline, which captures the what, when and where of an incident, followed by the impacts to the goals. The problem covered by the report is the aircraft dive and resulting injuries which occurred on February 9, 2014 at about 1549 (3:49 PM) on an Airbus A330-243 Voyager tanker air transport flight. Things that were different, unusual or unique at the time of the incident are also captured. In this case, the plane had experienced prior turbulence, and the co-pilot was not in his seat at the time of the dive.

The next step is to capture the impacts to the goals on the Outline. In this case, the safety goal is impacted because of a significant potential for fatalities, as well as the more than 30 actual injuries. Customer service is impacted due to the steep dive of the plane, and the regulatory goal is impacted due to the court-martial of the pilot, as well as 10 lawsuits against the Ministry of Defense. Production was impacted because the plane was grounded for 12 days, the property goal is impacted because of the potential for the loss of the whole plane, and the labor goal is impacted by the investigation.

Beginning with an impact to the goal, all the cause-and-effect relationships that led to that goal are captured on the Cause Map. In this case, the potential for fatalities resulted from the potential loss of the plane. According to Air Marshal Richard Garwood, previous director general of the UK’s Military Aviation Authority (MAA), “On this occasion, the A330 automatic self-protection systems likely prevented a disaster of significant scale. The loss of the aircraft was not an unrealistic possibility.” The potential for the loss of the plane resulted from the steep dive. The reason the plane was NOT lost (and this becomes a significant near miss) is the plane was recovered to level flight by the flight envelop protection system, which functioned as designed. (Although this is a positive, not a negative, it’s a cause all the same and should be included in the Cause Map.)

The steep dive resulted from the controller being forced forward without being counteracted. These are two separate causes that resulted in the effect, and are listed vertically and joined with an “AND” on the Cause Map. More detail should be provided about both causes. The command could not be counteracted because the co-pilot was not on the flight deck. He had been taking a break for several minutes before the incident. The investigation found that the controller was forced forward by a camera that was pushed against the controller. The camera had been placed between the seat and the controller, and then the seat was pushed towards (as is normal to occur during flight).

The investigation found that, despite concerns for about a year prior to this incident, loose personal articles were not prohibited on the flight deck. While there was a requirement to stow loose articles, it was not referenced in the operational manual and instead became one of thousands of paragraphs provided as background, resulting in a lack of awareness of controller interference from loose articles. The pilot was found to be using the camera while on the flight deck, likely due to boredom on the highly automated plane. (Analysis of the camera and flight recordings provided evidence.) The pilot was court-martialed for “negligently performing a duty, perjury and making a false record”, presumably at least partially due to the use of a personal camera while solo on the flight deck.

The report provided many recommendations as a result of the investigation, including increasing seat belt use by passengers and crew during rest periods, which would have reduced some of the injuries caused by unrestrained personnel striking the ceiling of the aircraft. Recommendations also included ensuring manufacturer’s safety advice is included in operational documents, promoting awareness of the danger of loose articles, and maximizing use of storage for loose articles, all of which aim to reduce the risk of loose articles contacting control equipment. An additional recommendation is to manage low in-flight pilot workload in an attempt to combat the boredom that can be experienced on long flights.

To view the Problem Outline, Cause Map, and recommendations, please click “Download PDF” above. Or click here to read the Military Aviation Authority’s report.

Root Cause Analysis - Incident Investigation

Small fire leads to thousands of canceled flights

August 19, 2016 Kim Smiley

By Kim Smiley

Starting August 8, 2016, thousands of travelers were stranded worldwide after widespread cancelations and delays of Delta Air Lines flights. The disruptions continued over several days and the impacts lingered even longer. The flight issues made headlines around the globe and the financial impact to the company was significant.

So what happened? What caused this massive headache to so many travelers? The short answer is a small fire in an airline data center, but a much longer answer is needed to understand what caused this incident. A Cause Map, a visual format for performing a root cause analysis, can be used to analyze this issue. All of the causes that contributed to an issue are visually laid out to intuitively show cause-and-effect relationships in a Cause Map. The Cause Map is built by asking “why” questions and adding the answers. For an effect with more than one cause, all of the causes that contributed to the effect are listed vertically and separated by an “and”. (Click on “Download PDF” to see an intermediate level Cause Map of this incident.)

So why were so many flights canceled and delayed? There was a system-wide computer outage and the airline depends on computer systems for everything from processing check-ins to assigning crews and gates. Bottom line, no flights leave on time without working computer systems. The issues originated at a single data center, but the design of the system led to cascading computer issues that impacted systems worldwide. The airline has not released any specific details about why exactly the issue spread, but this is certainly an area investigators would want to understand in order to create a solution to prevent a similar cascading failure in the future.

In a statement, the company indicated that an electrical component failed, causing a small fire at the data center. (Again, the specifics about what type of component and what caused the failure haven’t been released.) The fire caused a transformer to shut down which resulted in a loss of primary power to the data center. A secondary power system did kick on, but not all servers were connected to backup power. No details have been released about why some servers were not powered by the secondary power supply.

Compounding the frustration for the impacted travelers is the fact that they were unable to get updated flight information. Flight status systems, including airport monitors, continued to show that all flights were on time during the period of the cancelations and delays.

Once a large number of flights are disrupted, it is difficult to return to a normal flight schedule. The rotation schedule for airlines and pilots has to be redone, which can be time-consuming. Many commercial flights operate near capacity so it can be difficult to find seats for all the passengers impacted by canceled and delayed flights. Delta has tried to compensate travelers impacted by this incident by offering refunds and $200 in travel vouchers to people whose flights were canceled or delayed at least three hours, but an incident of this magnitude will naturally impact customer confidence in the company.

This incident is a good reminder of the importance of building robust systems with functional backups; otherwise a small problem can spread and quickly become a big problem.

Uncategorized

How Did a Cold War Nuclear Bomb Go Missing?

June 10, 2016 ThinkReliability Staff

By ThinkReliability Staff

Is there a nuclear bomb lost just a few miles off the coast of Savannah, Georgia? It seems that we will never know, but theories abound. While it is easy to get caught up in the narrative of these theories, it is interesting to look at the facts of what actually happened to piece together the causes leading up to the event. This analysis may not tell us if the bomb is still under the murky Wassaw Sound waters, but it can tell us something about how the event happened.

Around 2 am on February 5, 1958, a training exercise was conducted off the coast of Georgia. This was during the most frigid period of the Cold war, and training was underway to practice attacking specific targets in Russia. During this particular training mission, Major Howard Richardson was flying a B-47 bomber carrying a Mark 15, Mod 0 Hydrogen bomb containing 400 pounds of conventional explosives and some quantity of uranium.

The realistic training mission also included F-86 ‘enemy’ fighter jets. Unfortunately, one of those jets, piloted by Lt. Clarence Stewart, did not see the bomber on his radar and accidentally maneuvered directly into the B-47. The damage to both planes was extensive. The collision destroyed the fighter jet, and severely damaged the fuel tanks, engine, and control mechanisms of the bomber. Fortunately, Stewart was able to safely eject from the fighter jet. Richardson had a very difficult quest ahead of him: to get himself and his co-pilot safely on the ground without detonating his payload in a heavily damaged aircraft. He flew to the closest airfield; however, the runway was under construction, making the landing even more precarious for the two crew members and for the local community that would have been affected had the bomb exploded upon landing. Faced with an impossible situation, Richardson returned to sea, dropped the bomb over the water, observed that no detonation took place, and returned to carefully land the damaged bomber.

The Navy searched for the bomb for over two months, but bad weather and poor visibility did not make the search easy. On April 16, 1958, the search was ended without finding the bomb. The hypothesis was that the bomb was buried beneath 10 – 15 feet of silt and mud. Since then, other searches by interested locals and the government have still not identified the location of the bomb. In 2001, the Air Force released an assessment which suggests two interesting points. First, the bomb was never loaded with a ‘detonation capsule’, making the bomb incapable of a nuclear explosion. (Until this time, conventional wisdom suggested that the detonation capsule was included with the bomb.) Second, the report concluded that it would be more dangerous to try to move the bomb than to leave the bomb in its resting place.

While we may never learn the location of the bomb, we can learn from the incident itself. Using a Cause Map, we can document the causes and effects resulting in this incident, providing a visual root cause analysis. Beginning with several ‘why’ questions, we can create a cause-effect chain. In the simplest Cause Map, the safety goal was impacted as a result of the danger to the pilots and to the nearby communities as the result of a potential nuclear bomb explosion. This risk was caused by the bomb being jettisoned from the plane, which was a result of the collision between the fighter jet and the bomber. The planes collided due to the fact that they were performing a training mission to simulate a combat scenario.

More details are uncovered as this event is further broken down to include more information and to document the impact to other goals. The property goal is impacted through the loss of aircraft and the bomb. The bomb is missing because it was jettisoned from the bomber AND because it was never found during the search. The bomb was jettisoned because the pilot was worried that the bomb might break loose during landing. This was due to the fact that the planes collided. The planes collided due to the fact that the F-86 descended onto the top of the B-47 AND because they were in the midst of a training exercise. The fighter jet crashed into the bomber because the bomber was not on radar. The planes were performing an exercise because they were simulating bombing a Russian target, because it was the middle of the Cold War. The search was unsuccessful because the bomb is probably buried deep in the mud AND because the weather and visibility were bad during the search.

Finally, the ‘customer service’ goal is impacted by the fact that the residents in nearby communities are nervous about the potential danger of explosion/radiation exposure. This nervousness is caused by the fact that the bomb is still missing AND the fact that the bomb contained radioactive material, which was due to routine protocol at the time.

Evidence boxes are a helpful way to add information to the Cause Map that was discovered during the investigation. For example, an evidence box stating the evidence from the 2001 Air Force report that the bomb had no detonation capsule has been added to the Cause Map. A Cause Map is a useful tool to help separate the facts from the theories. Click on “Download PDF” above to see the full, detailed Cause Map.

Root Cause Analysis - Incident Investigation

Airplane Emergency Instructions: How do you make a work process clear?

May 12, 2016 ThinkReliability Staff

By ThinkReliability Staff

What’s wrong with the process above?

This process provides instructions on how to remove the over-wing exit door on an airplane during an emergency. However, imagine performing this process in an actual emergency. During the time you spend opening the door, there will probably be people crowded behind you, frantic to get off the plane. Step 4 indicates that after the door is detached from the plane wall, you should turn around and set the door (which is about 4’ by 2’ and can weigh more than 50 pounds) on the seats behind you. In most cases, this will be impossible. This is why emergency exit doors open towards the outside; in an emergency, a crush against the door will make opening the door IN impossible.

Even if it would be possible to place the door on the seat in the emergency exit row, it would likely reduce the safety of passengers attempting to exit. As discussed, the exit door is fairly large and heavy. It is likely to be displaced while passengers are exiting the airplane and may end up falling on a passenger, or blocking the exit path.

However, when this process was tested in training, it probably worked fine. Why? Because it wasn’t an actual emergency, and there probably weren’t a plane full of passengers that really wanted to get out. This is just another reason that procedures need to be tested in as close to actual situations as possible. At the very least, any scenario under which the process is to be performed should be replicated as nearly as possible.

Now take a look at this procedure:

It’s slightly better, not telling us to put the removed door on the seat behind us, but instead it doesn’t tell us what to do with the door. Keep in mind that the person performing this procedure’s “training” likely consisted of a 30-second conversation with a flight attendant and that in all probability, the first time he or she will perform the task is during an emergency situation. When testing a procedure, it’s also helpful to have someone perform the procedure who is not familiar with it, with instructions to do only what the procedure says. In this case, that person would end up removing the door . . . and then potentially attempting to climb out of the exit with the door in their hands. This is also not a safe or efficient method of emergency escape.
This procedure provides a much better description of what should be done with the door. The picture clearly indicates that the door should be thrown out of the plane, where it is far less likely to block the exit or cause passenger injury.

The first two procedures were presumably clear to the person who created them. But had they been tested by people with a variety of experience levels (particularly important in this case, because people of various experience levels may be required to open the doors in an emergency), the steps that really weren’t so clear may have been brought to light.

Reviewing procedures with a fresh eye (or asking someone to perform the procedure under safe conditions based only upon the written procedure) may help to identify steps that aren’t clear to everyone, even if they were to the writer. This can improve both the safety, and the effectiveness, of any procedure used in your organization.

Root Cause Analysis - Incident Investigation

8 Injured by Arresting Cable Failure on Aircraft Carrier

May 5, 2016 ThinkReliability Staff

By ThinkReliability Staff

An aircraft carrier is a pretty amazing thing. Essentially, it can launch planes from anywhere. But even though aircraft carriers are huge, they aren’t big enough for planes to take off or land in a normal method. The USS Dwight D. Eisenhower (CVN 69) has about 500′ for landing planes. In order for planes to be able to successfully land in that distance, it is equipped with an arresting wire system, which can stop a 54,000 lb. aircraft travelling 150 miles per hour in only two seconds and a 315′ landing area. This system consists of 4 arresting cables, which are made of wire rope coiled around hemp. These ropes are very thick and heavy and cause a significant risk to personnel safety if they are parted or detached.

This is what happened on March 18, 2016 while attempting to land an E-2C Hawkeye. An arresting cable came unhooked from the port side of the ship and struck a group of sailors on deck. At least 8 were injured, several of whom had to be airlifted off the ship for treatment. We will examine the details of this incident within a Cause Map, a visual form of root cause analysis.

The first step in any problem investigation is to define the problem. We capture the what, when, and where within a problem outline. Additionally, we capture the impacts to the goals. The injuries as well as the potential for death or even more serious injuries are impacts to the safety goal. Flight operations were shut down for two days, impacting both the mission and production/ schedule goal. The potential of the loss of or (serious damage to) the plane is an impact to the property goal. (In a testament to the skill of Navy pilots, the plane returned to Naval Station Norfolk without any crew injuries to the flight crew or significant damage to the plane.) The response and investigation are an impact to the labor goal. It’s also useful to capture the frequency of these types of incidents. The Virginian-Pilot reports that there have been three arresting-gear related deaths and 12 major injuries since 1980.

The next step in the problem-solving process is to determine the cause-and-effect relationships that led to the impacted goals. Beginning with the safety goal, the injuries to the sailors resulted from being struck by an arresting cable. When a workplace injury results, it’s also important to capture the personal protective equipment (PPE) that may have impacted the magnitude of the injuries. In this case, all affected sailors were wearing appropriate PPE, including heavy-duty helmets, eye and ear protection. This is a cause of the injuries because had they NOT been wearing PPE, the injuries would have certainly been much more severe, or resulted in death.

The arresting cable struck the sailors because it came unhooked from the port side of the ship. The causes for the detachment of the cable have not been conclusively determined; however, a material failure results from a force on the material that is greater than the strength of the material. In this case the force on the arresting cable is from the landing plane. In this case, the pilot reported the plane “hit the cable all at once”, which could have provided more force than is typical. The strength of the cable and connection may have been impacted by age or use. However, arresting cables are designed to “catch” and slow planes at full power and are only used for a specific number of landings before being replaced.

Other impacted goals can be added to the Cause Map where appropriate (additional relationships may result). In this case, the potential damage to the plane resulted from the landing failure, which was caused by the detachment of the arresting cable AND because the arresting cable is needed to safely land a plane on an aircraft carrier.

The last step of the Cause Mapping process is to determine solutions to reduce the risk of the incident recurring. More investigation is needed to ensure that the cable and connection were correctly installed and maintained. If it is determined that there were issues with the connection and cable, the processes that lead to the errors will be improved. However, it is determined that the cable and connection met design criteria and the detachment resulted from the plane landing at an unusual angle, there may be no changes as a result of this investigation.

It seems unusual that an investigation that resulted in 8 injuries would result in no action items. However, solutions are based on achieving an appropriate level of risk. The acceptable level of risk in the military is necessarily higher than it is in most civilian workplaces in order to achieve desired missions. Returning to the frequency from the outline, these types of incidents are extremely rare. The US Navy currently has ten operational aircraft carrier (and an eleventh is on the way). These carriers launch thousands of planes each year yet over the last 36 years, there have been only 3 deaths and twelve major injuries associated with landing gear failures, performing a dangerous task in a dangerous environment. Additionally, in this case, PPE was successful in ensuring that all sailors survived and limiting injury to them.

To view the outline and Cause Map of this event, click on “Download PDF” above.

Root Cause Analysis - Incident Investigation

The year Christmas almost wasn’t

December 21, 2015 Kim Smiley

By Kim Smiley

The movie Elf, starring Will Ferrell as Buddy the elf, tells the story of a Christmas that nearly disappointed children worldwide. On Christmas Eve night, as Santa made his magical trip to deliver his bag of Christmas gifts, his sleigh crashed in Central Park in New York City. Only quick thinking by Buddy and his friends got Santa airborne again and saved the holiday.

A Cause Map, a visual root cause analysis, can be built to analyze the crash of Santa’s sleigh. A Cause Map is built by visually laying out all the cause-and-effect relationships that contributed to the issue. The first step in the Cause Mapping process is to fill in an outline with the basic background information as well as impacts to the goal. Nearly every problem impacts more than one goal and listing all the impacts helps fully understand the scope of the issue.

In this example, there is potential risk of damage to the sleigh and injury to the big guy himself which would be an impact to the equipment goal and safety goal respectively. There was a delay in the present delivery schedule while Santa’s sleigh was on the ground, but the biggest concern was the impact to the customer service goal because millions of children had the potential to wake up to a Christmas morning without gifts, certainly something Santa and his elves desperately wanted to avoid. Once the Outline is completed, the Cause Map itself is built by starting at one impacted goal and asking “why” questions.

So why did Santa’s sleigh crash into Central Park? Santa’s sleigh crashed because it was high above the ground and it lost propulsion. Flying is the sleigh’s typical mode of operation because Santa needs a speedy, magical mode of transportation to do his job. The sleigh lost propulsion because both the primary and secondary propulsion systems failed.

Originally, Santa’s sleigh was powered purely by Christmas cheer, but levels of Christmas cheer have been steadily declining in modern times and a secondary system, a Kringle 3000, 500 Reindeer-Power jet engine, had to be added in the 1960s to keep the sleigh flying. On the Christmas in question, the level of Christmas cheer hit an all-time low and the strain on the jet engine mount was too great and it broke off. Without the jet engine, Santa’s sleigh crashed. Luckily, Buddy had told his friends that “the best way to spread Christmas cheer is singing loud for all to hear” and they were able to inspire enough folks to sing along with carols that Santa’s sleigh flew back into action and the children got their presents.

One would hope that the design of the jet engine was improved after this accident, but just to be safe and ensure that there are no sleigh crashes this year, make sure you sing plenty of Christmas carols loudly for all your friends and families to hear! And if you are concerned about Santa’s progress and want assurances that all is well, you can monitor his progress around the world at the NORAD Santa tracker.

Root Cause Analysis - Incident Investigation

Component Failure & Crew Response, Not Weather, Brought Down AirAsia Flight QZ8501

December 11, 2015 ThinkReliability Staff

By Staff

Immediately following the December 28, 2014 crash of AirAsia flight QZ8501, severe weather in the area was believed to have been the cause of the loss of control of the plane. (See our previous blog on the crash.) However, recovery of the “black box” and a subsequent investigation determined that it was a component failure and the crew’s response to the upset condition that resulted in the crash and that weather was not responsible. This is an example of the importance of gathering evidence to support conclusions within an investigation.

Says Richard Quest, CNN’s aviation correspondent, “It’s a series of technical failures, but it’s the pilot response that leads to the plane crashing.” Because, as in common in these investigations, there is a combination of causes that resulted in the crash, it can help to lay out the cause-and-effect relationships. We will do this in a Cause Map, a visual form of root cause analysis. The Cause Map is built by beginning with an impact to the goals, such as the safety goal, and asking why questions.

The 162 deaths (all on board) resulted from the plane’s rapid (20,000 feet per minute) plunge into the sea. According to the investigation, the crash resulted from an upset/ stall condition AND the crew’s inability to recover from that condition. Because both of these causes contributed to the crash, they are both connected to the effect (crash) and separated with an “AND”.

More detail can be added to each “leg” of the Cause Map by continuing to ask “why” questions. The prolonged stall/ upset condition resulted from the aircraft being pushed beyond its limits. (It climbed 5,400 feet in about 30 seconds.) This occurred because of manual handling and because of the failure of the rudder travel limiter system, which is designed to restrict rudder movement to a safe range. The system failed due to a loss of electrical continuity from a cracked solder joint on a circuit board. Although maintenance records showed 23 complaints with the system in the year prior to the crash, it was not repaired. A former pilot and member of the investigation team stated it was considered “minor damage” and was “not a concern”.

The plane was being manually controlled because the autopilot and autothrust were disengaged. These systems were disengaged when a circuit breaker was reset (removed and replaced) to attempt to reset the system after a computer system failure (indicated by four alarms that sounded in the cockpit). While this is sometimes done on the ground, it shouldn’t be done in the air because it disengages the autopilot and autothrust systems. However, the crew had inadequate upset recovery training. According to the manual from the manufacturer the aircraft is designed to prevent it from becoming upset and therefore training is not necessary. The decision to manually place the plane in a steep climb is believed to have been an attempt to get out of the poor weather. Just prior to the crash, the less experienced co-pilot was at the controls.

The lack of crew training on upset conditions is also believed to have caused the crash. In addition, for at least some time prior to the crash, the pilot and co-pilot were working against each other by pushing their control sticks in opposite directions. The pilot was heard on the voice recorder calling for them to “pull down”, although “pulling” is used to bring the plane up.

The only recommendation that has so far been released is for commercial pilots to undergo flight simulator training for this type of emergency situation. AirAsia has already done so. The company, as well as the aviation industry as a whole, will hopefully look at the conclusions of the investigation report with a very critical eye towards improving safety.

Uncategorized

Crash of Germanwings flight 95252 Leads to Questions

April 7, 2015 Angela Griffith

By ThinkReliability Staff

On March 24, 2015, Germanwings flight 9525 crashed into the French Alps, killing all 150 onboard. Evidence available thus far suggests the copilot deliberately locked the pilot out of the cockpit and intentionally crashed the plane. While evidence collection is ongoing, because of the magnitude of this catastrophe, solutions to prevent similar recurrences are already being discussed and, in some cases, implemented.

What is known about the crash can be captured in a Cause Map, or visual form of root cause analysis. Visually diagramming all the cause-and-effect relationships allows the potential for addressing all related causes, leading to a larger number of potential solutions. The analysis begins by capturing the impacted goals in the problem outline. In this case, the loss of 150 lives (everybody aboard the plane) is an impact to the safety goal and of primary concern in the investigation. Also impacted are the property goal due to the loss of the plane, and the recovery and investigation efforts (which are particularly difficult in this case due to the difficult-to-access location of the crash.)

Asking “Why” questions from the impacted goals develops cause-and-effect relationships. In this case, the deaths resulted from the crash of the plane into the mountains of the French Alps. So far, available information appears to support the theory that the copilot deliberately crashed the plane. Audio recordings of the pilot requesting re-entry into the cockpit, the normal breathing of the co-pilot, and the manual increase of speed of the descent while crash warnings sounded all suggest that the crash was deliberate. Questions have been raised about the co-pilot’s fitness for duty. Some have suggested increased psychological testing for pilots, but the agency Airlines for America says that the current system (at least in the US), is working: “All airlines can and do conduct fitness-for-duty testing on pilots if warranted. As evidenced by our safety record, the U.S. airline industry remains the largest and safest aviation system in the world as a result of the ongoing and strong collaboration among airlines, airline employees, manufacturers and government.”

Some think that technology is the answer. The flight voice recorder captured cockpit alarms indicating an impending crash. But these were simply ignored by the co-pilot. If flight guidance software was able to take over for an incapacitated pilot (or one who deliberately ignores these warnings, disasters like this one could be avoided. Former Department of Transportation Inspector General Mary Schiavo says, “This technology, I believe, would have saved the flight. Not only would it have saved this flight and the Germanwings passengers, it would also save lives in situations where it is not a suicidal, homicidal pilot. It has implications literally for safer flight across the industry.”

Others say cockpit procedures should be able to prevent an issue like this. According to aviation lawyers Brian Alexander & Justin Green, in a blog for CNN, “If Germanwings had implemented a procedure to require a second person in the cockpit at all times – a rule that many other airlines followed – he would not have been able to lock the pilot out.”

After 9/11, cockpit doors were reinforced to prevent any forced entry (according to the Federal Aviation Administration, they should be strong enough to withstand a grenade blast). The doors have 3 settings – unlock, normal, and lock. Under normal settings, the cockpit can be unlocked by crewmembers with a code after a delay. But under the lock setting (to be used, for example, to prevent hijackers who have obtained the crew code from entering the cockpit), no codes will allow access. (The lock setting has to be reset every 5 minutes.) Because of the possibility a rogue crewmember could lock out all other crewmembers, US airlines instituted the rule that there must always be two people in the cockpit. (Of course, if only a three-person crew is present, this can cause other issues, such as when a pilot became locked in the bathroom while the only other two flight crew members onboard were locked in the cockpit, nearly resulting in a terror alert. See our previous blog on this issue.)

James Hall, the former chairman of the National Transportation Safety Board, agrees. He says, “The flight deck is capable of accommodating three pilots and there shouldn’t ever be a situation where there is only one person in the cockpit.” In response, many airlines in Europe and Canada, including Germanwings’ parent company Lufthansa, have since instituted a rule requiring at least two people in the cockpit at all times. Other changes to increase airline safety may be implemented after more details regarding the crash are discovered.

Uncategorized

March 27, 1977: Two Jets Collide on Runway, Killing 583

March 27, 2015 Angela Griffith

By ThinkReliability Staff

March 27, 1977 was a difficult day for the aviation industry. Just after noon, a bomb exploded at the Las Palmas passenger terminal in the Canary Islands. Five large passenger planes were diverted to the Tenerife-Norte Los Rodeos Airport, where they completely covered the taxiway of the one-runway regional airport. Less than five hours later, when the planes were finally given permission to takeoff, two collided on the runway, killing 583, making this the worst accident at the time (and second now only to the September 11, 2001 attacks in the US.)

With the benefit of nearly 40 years of hindsight, it is possible to review the causes of the accident, as well as look at the solutions implemented after this accident, which are still being used in the aviation industry today. First we look at the impact to the goals as a result of this tragedy. The deaths of 583 people (out of a total of 644 on both planes) are an impact to the safety goal. The compensation to families of the victims (paid by the operating company of one of the planes) is an impact to the customer service goal. The property goal was impacted due to the destruction of both the planes, and the labor goal was impacted by the rescue, response, and investigation costs that resulted from the accident.

Beginning with one of the impacted goals, we can ask why questions to diagram the cause-and-effect relationships related to the incident. The deaths of the 583 people onboard were due to the runway collision of two planes. The collision occurred when one plane was taking off on the runway, and the other was taxiing to takeoff position on the same runway (called backtracking).

Backtracking is not common (most airports have separate runways and taxiways), but was necessary in this case because the taxiway was unavailable for taxiing. The taxiway was blocked by the three other large planes parked at the airport. A total of five planes were diverted to Tenerife which, having only one runway and a parallel taxiway, was not built to accommodate this number of planes. There were four turnoffs from the runway to the taxiway; the taxiing plane had been instructed to turn off at the third turn (the first turn that was not blocked by other planes). For unknown reasons, it did not, and the collision resulted between the third and fourth turnoff. (Experts disagree on whether the plane would have been able to successfully make the sharp turn at the third turnoff.)

One plane was attempting takeoff, when it ran into the second plane on the runway. The plane taking off was unaware of the presence of the taxiing plane. There was no ground radar and the airport was under heavy fog cover, so the control tower was relying on positions reported by radio. At the time the taxiing plane reported its position, the first plane was discussing takeoff plans with the control tower, resulting in interference rendering most of the conversation inaudible. The pilot of the plane taking off believed he had clearance, due to confusing communication between the plane and the air traffic control tower. Not only did the flight crews and control tower speak different languages, the word “takeoff” was used during a conversation that was not intended to provide clearance for takeoff. Based on discussions between the pilot and flight crew on the plane taking off have, investigators believed, but were not able to definitively determine, that other crew members may have questioned the clearance for takeoff, but not to the extent that the pilot asked the control tower for clarification or delayed the takeoff.

After the tragedy, the airport was upgraded to include ground radar. Solutions that impacted the entire aviation industry included the use of English as the official control language (to be used when communicating between aircraft and control towers) and also prohibited the use of the word “takeoff” unless approving or revoking takeoff clearance. The potential that action by one of the other crew members could have saved the flights aided in the concept of Crew Resource Management, to ensure that all flight crew members could and would speak up when they had questions related to the safety of the plane.

Though this is by far the runway collision with the greatest impact to human life, runway collisions are still a concern. In 2011, an Airbus A380 clipped the wing of a Bombardier CRJ (see our previous blog). Officials at Los Angeles International Airport (LAX) experienced 21 runway incursions in 2007, after which they redesigned the runways and taxiways so that they wouldn’t intersect, and installed radar-equipped warning lights to provide planes with a visual warning of potential collisions (see our previous blog).

To view the outline, Cause Map and recommended solutions from the Tenerife runway collision of 1977, click on “Download PDF” above. Or, click here to read more.

Root Cause Analysis - Incident Investigation

Plane Narrowly Avoids Rolling into Bay

March 11, 2015 Angela Griffith

By ThinkReliability Staff

Passengers landing at LaGuardia airport in New York amidst a heavy snowfall on March 5, 2015, were stunned (and 23 suffered minor injuries) when their plane overran the runway and approached Flushing Bay. The National Transportation Safety Board (NTSB) is currently investigating the accident to determine not only what went wrong in this particular case, but what standards can be implemented to reduce the risk of runway overruns in the future.

Says Steven Wallace, the former director of the FAA’s accident investigations office (2000-2008), “Runway overruns are the accident that never goes away. There has been a huge emphasis on runway safety and different improvements, but landing too long and too fast can result in an overrun.” Runway overruns are the most frequent type of accident (there are about 30 runway overruns due to wet or icy runways across the globe every year), and runway overruns are the primary cause of major damage to airliners.

Currently, the NTSB is collecting data (evidence) to aid in its investigation of the accident. The plane is being physically examined, and the crew is being interviewed. The data recorders on the flight are being downloaded and analyzed. While little information is able to be verified or ruled out at this point, there is still value in organizing the questions related to the investigation in a logical way.

We can do this using the Cause Mapping method of root cause analysis, which organizes cause-and-effect relationships related to an incident. We begin by capturing the impact to an organization’s goals. In this case, 23 minor passenger injuries were reported, an impact to the safety goal. There was a fuel leak of unknown quantity, which impacts the environmental goal. Customer service was impacted due to a scary landing and evacuation from the aircraft via slides. Air traffic at LaGuardia was shut down for 3 hours, impacting the production goal. Both the airplane and the airport perimeter fence suffered major damage, which impacts the property/equipment goal. The labor goal was also impacted due to the response and ongoing investigation.

By beginning with an impacted goal and asking “why” questions, we can begin to diagram the potential causes that may have resulted in an incident. Potential causes are causes without evidence. If evidence is obtained that supports a cause, it becomes a cause and it is no longer followed by a question mark. If evidence rules out a cause, it can be crossed out but left on the Cause Map. This reduces uncertainty as to whether a potential cause has been considered and ruled out, or not considered at all.

In this case, the NTSB will be looking into runway conditions, landing procedures, and the condition of the plane. According to the airport, the runway was cleared within a few minutes of the plane landing, although the crew has said it appeared all white during landing. The National Weather Service reported 7″ of snow in the New York area on the day of the overrun. Procedures for closing runways or aborting landings are also being considered. Just prior to the landing, other pilots who had recently landed reported braking conditions as good.

The crew has also reported that although the auto brakes were set to max, they did not feel any deceleration. The entire braking system will be investigated to determine if equipment failure was involved in the accident. (Previous overruns have been due to brake system failures or the failure of reverse thrust from one of the engines, causing the plane to veer.) The pilot also reported the automatic spoiler did not deploy, but they were deployed manually.

Also being investigated are the landing speed and position, though there is no evidence to suggest that there was any issue with crew performance. As more information is released, it can be added to the investigation. When the cause-and-effect relationships are better determined, the NTSB can begin looking at recommendations to reduce future runway overruns.

Your Expert Root Cause Analysis Resource

Tag Archives: Aviation

Plane Dive Caused by Personal Camera Results in Court-Martial

Small fire leads to thousands of canceled flights

How Did a Cold War Nuclear Bomb Go Missing?

Airplane Emergency Instructions: How do you make a work process clear?

8 Injured by Arresting Cable Failure on Aircraft Carrier

The year Christmas almost wasn’t

Component Failure & Crew Response, Not Weather, Brought Down AirAsia Flight QZ8501

Crash of Germanwings flight 95252 Leads to Questions

March 27, 1977: Two Jets Collide on Runway, Killing 583

Plane Narrowly Avoids Rolling into Bay