Tag Archives: near miss

Extensive Contingency Plans Prevent Loss of Pluto Mission

By ThinkReliability Staff

Beginning July 14, 2015, the New Horizons probe started sending photos of Pluto back to earth, much to the delight of the world (and social media).  The New Horizons probe was launched more than 9 years ago (on January 19, 2006) – so long ago that when it left, Pluto was still considered a planet. (It’s been downgraded to dwarf planet now.)  A mission that long isn’t without a few bumps in the road.  Most notably, just ten days before New Horizons’ Pluto flyby, mission control lost contact with the probe.

Loss of communication with the New Horizons probe while it was nearly 3 billion miles away could have resulted in the loss of the mission.  However, because of contingency and troubleshooting plans built in to the design of the probe and the mission, communication was able to be restored, and the New Horizons probe continued on to Pluto.

The potential loss of a mission is a near miss. Analyzing near misses can provide important information and improvements for future issues and response.  In this case, the mission goal is impacted by the potential loss of the mission (near miss).  The labor and time goal are impacted by the time for response and repair.  Because of the distance between mission control on earth and the probe on its way to Pluto, the time required for troubleshooting was considerable owing mainly to the delay in communications that had to travel nearly 3 billion miles (a 9-hour round trip).

The potential loss of the mission was caused by the loss of communication between mission control and the probe.  Details on the error have not been released, but its description as a “hard to detect” error implies that it wasn’t noticed in testing prior to launch.  Because the particular command sequence that led to the loss of communication was not being repeated in the mission, once communication was restored there was no concern for a repeat of this issue.

Not all causes are negative.  In this case, the “loss of mission” became a “potential loss of mission” because communication with the probe was able to be restored.  This is due to the contingency and troubleshooting plans built in to the design of the mission.  After the error, the probe automatically switched to a backup computer, per contingency design.  Once communication was restored, the spacecraft automatically transmits data back to mission control to aid in troubleshooting.

Of the mission, Alice Bowman, the Missions Operation Manager says, “There’s nothing we could do but trust we’d prepared it well to set off on its journey on its own.”  Clearly, they did.

Planes Nearly Collide Over DC

By ThinkReliability Staff

Two planes came within seconds of a collision on  July 31, 2012 when both were directed to the same airspace by controllers.  Although no incident occurred, such near misses should be investigated thoroughly to prevent incidents in the future.

We can perform a root cause analysis of this incident in visual Cause Mapping form.  We begin with the impacts to the goals.  In the case of a near-miss like this one, some of the impacts to the goals will be hypothetical, based on the potential of the incident actually occurring.  For example, the safety goal is impacted because of the potential of death or injury to the passengers and crew on the planes.  The property goal is also impacted due to the potential of damage to the planes.  Even though this incident was considered a near-miss, there were some actual impacts to the goals, such as the delay in landing of the inbound plane, which can be considered an impact to the customer service, schedule, and  labor goal.

Once we have determined the impacts to the goals, we can begin the analysis by asking “why” questions.  In this case, the safety and property goals were impacted due to the potential collision of two planes.  These planes could have collided because they were on a collision course.   One plane was taking off directly towards another  plane that was trying to land.  The landing plane was landing in the opposite direction as usual (from the South instead of from the North) in order to avoid high winds from an incoming storm.  The plane taking off was cleared to take off towards the incoming plane (towards the South) by a different controller who was unaware that incoming planes were coming in from a different direction.  Communication of the change in incoming flights was not made to all controllers in the area and, although no details are available, it appears that the procedure used by the controllers when changing the flow towards the airport was inadequate.

There are thousands of recorded errors by air traffic controllers every year, and Reagan National (where this incident occurred) has had some particularly high-profile incidents, such as when a controller fell asleep (see   previous blog), involving air traffic controllers.  On August 10, 2012, two aircraft clipped each other at another Washington, DC area airport, although it is unclear if controllers were involved.  (See the article here.)  A congressional and FAA investigation is underway, and will hopefully address some needed improvements in air safety.

To view the Outline and Cause Map, please click “Download PDF” above.

Pilot Locked in Bathroom Nearly Results in Terror Alert

By Kim Smiley

In order for a flight to take off and land safely, many complex mechanical systems have to work for the plane to function properly.  Additionally, pilots need to be properly trained and proficient at their jobs.  Airline processes also have to work in order to smoothly ticket, security screen and board all the passengers.

The number of things that have to work for a successful commercial airline flight is impressive.  A recent incident highlighted that even the smallest hiccup, a broken bathroom lock for example, has the potential to cause big issues in the complex world of commercial flights.

On November 18, 2011, a pilot accidentally got locked inside a bathroom just prior to landing at LaGuardia.  This incident almost resulted in an emergency being declared and terrorist alert being issued.  In order to understand this incident, a Cause Map can be built.  A Cause Map is a visual root cause analysis that illustrates the cause and effect relationship between all the Causes that contribute to an event.

In this example, the copilot considered declaring an emergency because the pilot was gone from the cockpit longer than excepted and an unknown man with an accent knocked on the cockpit door.  The copilot was concerned that this might be a potential hijacking attempt.  His concern was caused by the intended destination being NYC and the 9/11 attacks that occurred there 10 years ago.

The pilot was taking longer than normal because the bathroom door lock had jammed when he had tried to exit after a bathroom break.  The unknown man was a well-intended passenger who had heard the pilot calling for help.  The pilot had given him the password to access the cockpit because all other crew members were inside the cockpit.  There were two reasons that all other crew members were inside the cockpit.  First, regulations require that at least 2 crew members are inside the cockpit at all times.  Second this was a small airplane staffed with only 3 crew members.  If the pilot or copilot needed to use the restroom, the only flight attendant had to enter the cockpit to meet the rules.

Luckily, the pilot was eventually able to free himself from the bathroom and return to the cockpit before anything too exciting happened.  The plane landed as scheduled.  The FBI and Port Authority cops met the plane, but after briefly talking to the passenger involved it was quickly determined that nothing suspicious had occurred.

Air Traffic Controller Asleep On the Job

By Kim Smiley

At least three times over the past decade, air traffic controller fatigue has been investigated by the National Transportation Safety Board (NTSB) in near-miss airline accidents.  Five years ago, controller fatigue was a significant factor in a Lexington, KY crash killing 49, the last fatal crash related to this problem.  Again last week, controller fatigue was in the news when two early-morning aircraft had uncontrolled landings at Reagan National Airport near Washington D.C.  The controller, who had 20 years of experience with most of them at Reagan, was clearly well experienced.  In fact, the controller was also a supervisor.  But no level of experience can overcome the effects of fatigue.  The relieved controller stated that he had worked the 10 p.m. to 6 a.m. shift four nights in a row.

Faced with harsh criticism over the latest incident, the FAA reacted by mandating a second controller at Reagan National Airport and reviewing traffic management policies at all single-person towers.  Regional radar controllers are now required to check in with single-person towers during night shifts to ensure controllers are prepared to handle incoming traffic.

Controller fatigue is a well known problem, and multiple solutions have been suggested over the past two decades.  It has been a part of the NTSB’s Most Wanted list since 1990.  In 2007 following the Lexington crash, the NTSB urged the Federal Aviation Administration (FAA) to overhaul their controller schedules, claiming that the stressful work and hectic pace were putting passengers and crews at risk.  The FAA responded, and is currently working with the National Air Traffic Controllers Association (NATCA) to develop “a science-based controller fatigue mitigation plan”.

In addition, from 2007 to 2011, more than 5,500 new air traffic controllers were hired.  However, many of these simply replaced air traffic controllers who were retiring, resulting in no net gain in the pool of available labor.  Air traffic controllers have a mandated retirement age of 56, with exceptions available up to age 61.  Additionally, on-the-job training is extensive, requiring two to four years just to receive initial certification.  Adding staffing therefore is more difficult than initially meets the eye.

Faced with an expected increase in air traffic and an aging infrastructure, the FAA has aggressively pursued a long-term modernization called NextGen.  With the proposed modernization and staffing, the 2011 FAA budget request is now $1.14B, a $275M or 31% increase from 2010.  While material and personnel changes are often necessary, sometimes simpler solutions are equally effective or quicker to implement.

The associated Cause Map reflects the multiple solutions suggested, and even implemented, to combat the problem of controller fatigue.  As discussed, the FAA, NTSB and NATCA have pursued multiple paths to overcome the issue of controller fatigue.  However, as the Cause Map shows, there are multiple contributing factors in this case.  Controller fatigue isn’t the only reason those planes had an uncontrolled landing, and controller fatigue wasn’t caused by just four night shifts in a row.  Because there are multiple reasons why this happened, it also means there are multiple opportunities to correct future problems.  The key isn’t eliminating all of the causes, but rather eliminating the right one.