Air Traffic Controller Asleep On the Job

By Kim Smiley

At least three times over the past decade, air traffic controller fatigue has been investigated by the National Transportation Safety Board (NTSB) in near-miss airline accidents.  Five years ago, controller fatigue was a significant factor in a Lexington, KY crash killing 49, the last fatal crash related to this problem.  Again last week, controller fatigue was in the news when two early-morning aircraft had uncontrolled landings at Reagan National Airport near Washington D.C.  The controller, who had 20 years of experience with most of them at Reagan, was clearly well experienced.  In fact, the controller was also a supervisor.  But no level of experience can overcome the effects of fatigue.  The relieved controller stated that he had worked the 10 p.m. to 6 a.m. shift four nights in a row.

Faced with harsh criticism over the latest incident, the FAA reacted by mandating a second controller at Reagan National Airport and reviewing traffic management policies at all single-person towers.  Regional radar controllers are now required to check in with single-person towers during night shifts to ensure controllers are prepared to handle incoming traffic.

Controller fatigue is a well known problem, and multiple solutions have been suggested over the past two decades.  It has been a part of the NTSB’s Most Wanted list since 1990.  In 2007 following the Lexington crash, the NTSB urged the Federal Aviation Administration (FAA) to overhaul their controller schedules, claiming that the stressful work and hectic pace were putting passengers and crews at risk.  The FAA responded, and is currently working with the National Air Traffic Controllers Association (NATCA) to develop “a science-based controller fatigue mitigation plan”.

In addition, from 2007 to 2011, more than 5,500 new air traffic controllers were hired.  However, many of these simply replaced air traffic controllers who were retiring, resulting in no net gain in the pool of available labor.  Air traffic controllers have a mandated retirement age of 56, with exceptions available up to age 61.  Additionally, on-the-job training is extensive, requiring two to four years just to receive initial certification.  Adding staffing therefore is more difficult than initially meets the eye.

Faced with an expected increase in air traffic and an aging infrastructure, the FAA has aggressively pursued a long-term modernization called NextGen.  With the proposed modernization and staffing, the 2011 FAA budget request is now $1.14B, a $275M or 31% increase from 2010.  While material and personnel changes are often necessary, sometimes simpler solutions are equally effective or quicker to implement.

The associated Cause Map reflects the multiple solutions suggested, and even implemented, to combat the problem of controller fatigue.  As discussed, the FAA, NTSB and NATCA have pursued multiple paths to overcome the issue of controller fatigue.  However, as the Cause Map shows, there are multiple contributing factors in this case.  Controller fatigue isn’t the only reason those planes had an uncontrolled landing, and controller fatigue wasn’t caused by just four night shifts in a row.  Because there are multiple reasons why this happened, it also means there are multiple opportunities to correct future problems.  The key isn’t eliminating all of the causes, but rather eliminating the right one.

Issues at Fukushima Daiichi Unit 3

By ThinkReliability Staff

There are many complex events occurring with some of Japan’s nuclear power plants as a result of the earthquake and tsunami on March 11, 2011.  Although the issues are still very much ongoing, it is possible to begin a root cause analysis of the events and issues.  In order to clearly show one issue, our analysis within this blog is limited to the issues affecting Fukushima Daiichi Unit 3.  This is not to minimize the issues occurring at the other plants and units, but rather to clearly demonstrate the cause-and-effect within one small piece of the overall picture.

The issues surrounding Unit 3 are extremely complex.  In events such as these, where many events contribute to the issues, it can be helpful to make a timeline of events.  A timeline of the events so far can be seen by clicking “Download PDF” above.  A timeline can not only help to clarify the order of contributing events, it can also help create the Cause Map, or visual root cause analysis.  To show how the events on the timeline fit into the Cause Map, some of the entries are denoted with numbers, which are matched to the same events on the Cause Map.  Notice that in general, because Cause Maps build from right to left with time, earlier entries are found to the right of newer events.  For example, the earthquake was the cause of the tsunami, so the earthquake is to the right of the tsunami on the map.  Many of the timeline events are causes, but some are also solutions.  For example, the venting of the reactor is a solution to the high pressure.  (It also becomes a cause on the map.)

A similar analysis could be put together for all of the units affected by the earthquake, tsunami and resulting events.  Parts of this cause map could be reused as many of the issues affecting the other plants and units are     similar to the analysis shown here. It would also be possible to build a larger Cause Map including all impacts from the earthquake.

The impact to goals needs to be determined prior to building a Cause Map. As a direct result of the events at Unit 3, 7 workers were injured.  This is an impact to the worker safety goal.  There is the potential for health effects to the population, which is an impact to the public safety goal.  The environmental goal was impacted due to the release of radioactivity into the environment.  The customer service goal was impacted due to evacuations and rolling blackouts, caused by the loss of electrical production capacity, which is an impact to the production goal.  The loss of capacity was caused by catastrophic damage to the plant, which is an impact to the property goal.  Additionally, the massive effort to cool the reactor is an impact to the labor goal.

The worker safety and property goals were impacted because of a hydrogen explosion, which was caused by a buildup of pressure in the plant, caused by increasing reactor temperature.  Heat continues to be generated by a nuclear reactor, even after it is shutdown, as a natural part of the operating process.  In this case, the normal cooling supply was lost when external power lines were knocked down by the tsunami (which was caused by the earthquake).  The tsunami also apparently damaged the diesel generators which provided the emergency cooling system.  The backup to the emergency cooling supply stopped automatically and was unable to be restarted, for reasons that are as yet unknown.

The outline, timeline and cause map shown on the PDF are extremely simplified.  Part of this simplification is due to the fact that as the event is still ongoing and not all information is known, or has been released. Once more information becomes available, it can be added to the analysis, or the analysis can be revised.

To learn more about the reactor issues at Fukushima Daiichi, view our video summary.  To see a blog about the impact of the fallout on the health of babies in the US, see our healthcare blog.

Two Killed in Barge/Tour Boat Collision

By ThinkReliability Staff

On July 7, 2010, a barge being propelled by a tug boat collided with a tour boat that had dropped anchor in the Delaware River.  As a result of the collision, two passengers on the tour boat were killed and twenty-six were injured.  The tour boat sank in 55 feet of water.

Detail regarding the incident has just been released in an updated NTSB report.  We can use the information about this report to begin a Cause Map, or visual root cause analysis.  The information in the report can also point us in the direction of important questions that remain to be answered to determine exactly what happened and, most importantly, how incidents like these can be prevented in the future.

In this case, a tour boat had dropped anchor to deal with mechanical problems.  According to the tour boat crew’s testimony and radio recordings, the tour boat crew attempted to get in touch with the tug boat by yelling and making radio calls.  Neither were answered or apparently noticed.  The barge that was being propelled by the tug boat crashed into the tour boat, resulting in deaths, injuries and loss of the tour boat.

The lookout on the tug boat was inadequate (had it been adequate, the tug boat would have noticed the tour boat in time to avoid the collision).  The report has determined that the tug boat master was off-duty and below-deck at the time of the collision.  According to cell phone records, the mate who was on lookout duty was on a phone call at the time of the collision and had made several phone calls during his duty. The inadequate lookout combined with the inability of the tour boat to make contact with the tug boat resulted in the collision.

There are two obvious areas where more detail is needed in the Cause Map to determine what was going on that led to the issues on the tug boat.  Specifically, why was the lookout on the cell phone and why wasn’t the tour boat able to contact the tug boat through the radio?  Because of the strict requirements for lookouts on marine duty, there is also an ongoing criminal investigation into the lookout’s actions.  When the final NTSB report is issued, and the criminal case is closed, these questions should be answered.  More detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.

San Francisco’s Stinking Sewers

By ThinkReliability Staff

The Golden Gate City is well known for its ground-breaking, environmentally-friendly initiatives.  In 2007 San Francisco outlawed the use of plastic bags at major grocery stores.  The city also mandated compulsory recycling and composting programs in 2009.  Both ordinances were the first laws of their kind in the nation, and criticized by some for being overly aggressive.  Likewise San Francisco’s latest initiative, to reduce city water usage by encouraging the use of low-flow toilets, has faced harsh criticism.

Recently San Francisco began offering substantial rebates to homeowners and businesses to install high efficiency toilets (HETs).  These types of toilet use 1.28 gallons or less per flush, down from the 1.6 gpf versions required today by federal law and even older 3.4 gpf toilets from decades ago.  That means that an average home user will save between 3,800 to 5,000 gallons of water per year per person.  In dollars, that’s a savings of $90 annually for a family of four.  This can quickly justify the cost of a new commode, since a toilet is expected to last 20 years.

Aside from cost savings, there are obvious environmental benefits to reduced water use.  The city initially undertook the HET rebate initiative to decrease the amount of water used overall by the city and the amount of wastewater requiring treatment.  They were successful, and water usage decreased.  In fact, the city’s Public Utilities Commission stated that San Francisco residents reduced their water consumption by 20 million gallons of water last year.  San Francisco last year used approximately 215 million gallons per day.  This also met other goals the city had, such as reducing costs to consumers.  Unintentionally though, the HET rebate initiative impacted a different goal – Customer Service.

As shown on the associated Cause Map, reduced water flow had a series of other effects.  While water consumption – and presumably waste water disposal – shrank significantly, waste production has remained constant.  Despite $100M in sewage systems upgrades over the past five years, current water flow rates are not high enough to keep things moving through the system.  As a result sewage sludge builds up in sewer lines.  As bacteria eat away at the organic matter in the sludge, hydrogen sulfide is released.  Hydrogen sulfide is known for its characteristic “rotten egg” smell.

This creates an unfortunate situation.  No one wants to walk through smelly streets.  Further, slow sewage means a build-up of potential harmful bacteria.  However, everyone agrees San Francisco should strive to conserve water.  Water is a scarce and increasingly expensive resource in California.  What’s the next step in solving the stinking sewer problem?

San Francisco is not the first city to deal with this issue.  There is substantial debate over the city’s current plan to purchase $14M in bleach to clean up the smell.   Many parties are concerned about potential environmental impacts and potential contamination to drinking water.  Other solutions have been proposed by environmental activists, but may have financial ramifications.

Cause Maps can help all parties come to agreement because they focus problem solvers on the goals, not the details of the problem.  In this case, all parties are trying to protect the environment and reduce costs to city residents.  Based on those goals and the Cause Map, potential solutions have been developed and placed with their corresponding causes.  The next step is to proactively consider how these new actions might affect the stakeholders’ goals.  Perhaps other goals could be impacted, such as the safety of drinking water and potential contamination of San Francisco Bay.  Financial goals will surely be impacted to varying degrees with each solution.  Revising the Cause Map can help identify the pros and cons of each approach and narrow down which solution best satisfies all parties.