Category Archives: Uncategorized

Not all McDonald’s franchise owners “lovin” the new menu

By Kim Smiley

Are you “lovin’ it” now that McDonald’s offers breakfast all day? If so, you are not alone because McDonald’s has stated that extended breakfast hours had been the number one request by customers. After recent declines in sales, McDonald’s is hoping that all-day breakfast will boost profits, but some franchise owners are concerned that extending breakfast hours will actually end up hurting their businesses.

Offering breakfast during the day is not as simple as it may sound because McDonald’s are now required to offer breakfast in addition to their regular fare.   Cooking only hash browns in the fryers is inherently simpler than figuring out how to cook both hash browns and fries at the same time. Basically, attempting to prepare breakfast simultaneously with traditional lunch and dinner items creates a more complicated workflow in the kitchen. Complication generally slows things down, which can be a major problem for a fast food restaurant.

If customers get annoyed at increased wait times, they may choose to visit one of the many other fast food restaurants, rather than McDonald’s, for their next meal out. Many franchisees are investing in more kitchen equipment and increasing staffing to support extended breakfast hours, both of which can quickly eat into the button line.  Increased profits from offering all-day breakfast will need to balance out the cost required to support it or franchise owners will lose money.

Franchise owners have also expressed concern that customers may spend less money now that breakfast is an option after 11 am.  Breakfast items in general are less expensive than other fare and if customers choose to order an egg-based sandwich for lunch rather than a more expensive hamburger it could potentially cut into profits.  It all depends on the profit margin on each individual menu item, but restaurants need to make sure they aren’t offering items that will compete with their more profitable offerings.

The changing menu also has the potential to frustrate customers (and frustrated customers will generally find somewhere else to buy their next lunch).  The addition of all-day breakfast has resulted in menu changes at many McDonald’s and more menu variability between franchises.  The larger the menu offered the more difficult it is to create cheap food quickly so some less popular items like wraps have been cut at many McDonald’s locations to make room for breakfast.  If you are a person who loves wraps and doesn’t really want an egg muffin, this move is pretty annoying.  The other potential problem is that most McDonald’s are only offering either the English muffin-based sandwiches or the biscuit-based sandwiches (but not both) after the traditional breakfast window.  So depending on the McDonald’s, you may be all fired up for an all-day breakfast Egg McMuffin to be told that you still need to get there before 10:30 am to order one since about 20 percent of McDonald’s have chosen to go with biscuit-based breakfast sandwiches instead.


There are multiple issues that need to be considered to really understand the impacts of switching to all-day breakfast.  Even seemingly simple “problems” like this can quickly get complicated when you start digging into the details.  A Cause Map, a visual root cause analysis, can be used to intuitively lay out the potential issues from adding all-day breakfast to menus at McDonald’s.  A Cause Map develops cause-and-effect relationships so that the problem can be better understood.  To view a Cause Map for this example, click on “Download PDF” above.

Studies have found that at least one quarter of American adults eat fast food everyday (which could be its own Cause Map…) so there are a lot of dollars being spent at McDonald’s and its competitors. Only time will tell if all-day breakfast will help McDonald’s gobble up a bigger market share of the fast food pie, but fast food restaurants will certainly continue trying to outdo each other as long as demand remains high.

Invasive Pythons Decimating Native Species in the Everglades

By Kim Smiley

Have you ever dreamed of hunting pythons?  If so, Florida is hosting the month-long 2016 Python Challenge and all you need to do to join in is to pay a $25 application fee and pass an online test to prove that you can distinguish between invasive pythons and native snake species.

The idea behind the python hunt is to reduce the population of Burmese pythons in the Florida Everglades.  As the number of pythons has increased, there has been a pronounced decline in native species’ populations, including several endangered species.  Researchers have found that 99% of raccoons and opossums have vanished along with 88% of bobcats, along with declines in nearly every other species.  Pythons are indiscriminate eaters and consume everything from small birds to full-grown deer.  The sheer number of these invasive snakes in the Florida Everglades is having a huge environmental impact.

The exact details of how pythons were released into the Everglades aren’t known, but genetic testing has confirmed that the population originated from pet snakes that were either released or escaped into the wild. Once the pythons were introduced into the Everglades, their number quickly grew as the python population thrived.  The first Burmese python was found in the Florida Everglades in 1979 and now there are estimated to be as many as 100,000 of the snakes in the area.

There are many factors that have led to the rapid growth in the python population.  They are able to live in the temperate Florida climate, have plentiful food available, and are successfully reproducing.  Pythons produce a relatively large number of eggs (an average of 40 eggs about every 2 years) and the large female python protects them.  Hatchling pythons are also larger than most hatchling snakes, which increases their chance of surviving into adulthood.  There are very few animals that prey on adult pythons.  Researchers have found that alligators occasionally eat pythons, but that the relationship between these two top predators can go both ways and pythons have occasionally eaten alligators up to 6 feet in length.  The only other real predators capable of taking down a python are humans and even that is a challenge.

Before a python can be hunted, it has to be found and that is often much easier said than done. Pythons have excellent camouflage and are ambush predators that naturally spend a large percentage of the day hiding.  They also are semi-aquatic and excellent climbers so they can be found in both the water and in trees.  Despite their massive size (they can grow as long as 20 feet and weigh up to 200 pounds), they blend in so well with the environment that researchers even have difficulty finding snakes with radio transmitters showing their locations.

The last python challenge was held about 3 years ago and 68 snakes were caught.  While that number may not sound large, it is more snakes than have been caught in any other month.  The contest also helped increase public awareness of the issue and hopefully discouraged any additional release of pets of any variety into the wild.  For the 2016 contest, officials are hoping to improve the outcome by offering prospective hunters on-site training with a guide who will educate them on swamps and show them areas where snakes are most likely to be found.

To view a Cause Map, a visual root cause analysis format, of this issue click on “Download PDF” above.  A Cause Map intuitively lays out the cause-and-effect relationships that contributed to the problem.

You can check out some of our previous blogs to view more Cause Maps for invasive species if you want to learn more:

Small goldfish can grow into a large problem in the wild

Plan to Control Invasive Snakes with Drop of Dead Mice

Runway Fire Forces Evacuation of Airplane

By ThinkReliability Staff

On September 8, 2015, an airplane caught fire during take-off from an airport in Las Vegas, Nevada. The pilot was able to stop the plane, reportedly in just 9 seconds after becoming aware of the fire. The crew then evacuated the 157 passengers, 27 of whom received minor injuries as a result of the evacuation by slide. Although the National Transportation Safety Board (NTSB) investigation is ongoing, information that is known, as well as potential causes that are under consideration, can be diagrammed in a Cause Map, or visual root cause analysis.

The first step of Cause Mapping is to define the problem by completing a problem outline. The problem outline captures the background information (what, when and where) of the problem, as well as the impact to the goals. In this case, the safety goal is impacted due to the passenger injuries. The evacuation of the airplane impacts the customer service goal. The NTSB investigation impacts the regulatory goal. The schedule goal is impacted by a temporary delay of flights in the area, and the property goal is impacted by the significant damage to the plane. The rescue, response and investigation is an impact to the labor goal.

The Cause Map is built by beginning with one of the impacted goals and asking “Why” questions to develop the cause-and-effect relationships that led to an issue.   In this case, the injuries were due to evacuation by slide (primarily abrasions, though some sources also said there were some injuries from smoke inhalation). These injuries were caused by the evacuation of the airplane. The airplane was evacuated due to an extensive fire. Another cause leading to the evacuation was that take-off was aborted.

The fact that take-off was able to be aborted, for which the pilot has been hailed as a hero, is actually a positive cause. Had the take-off been unable to be aborted, the result would likely have been far worse. In the case of the Concorde accident, a piece of debris on the runway ruptured a tire, which caused damage to the fuel tank, leading to a fire after the point where take-off could be aborted. Instead, the aircraft stalled and crashed into a hotel, killing all onboard the craft and 4 in the hotel. The pilot’s ability to quickly save the plane almost certainly saved many lives.

The fire is thought to have been initiated by an explosion in the left engine due a catastrophic uncontained explosion of the high-pressure compressor. This assessment is based on the compressor fragments that were found on the runway. This likely resulted from either a bird strike (as happened in the case of US Airways flight 1549), or a strike from other debris on the runway (as occurred with the Concorde), or fatigue failure of the engine components due to age. This is the first uncontained failure of this type of engine, so some consider fatigue failure to be less likely. (Reports of an airworthiness directive after cracks were detected in weld joints of compressors were in engines with different parts and a different compressor configuration.)

In this incident, the fire was unable to be put out without assistance from responding firefighters. This is potentially due to an ongoing leak of fuel if fuel lines were ruptured and the failure of the airplane’s fire suppression system, which reportedly deployed but did not extinguish the fire. Both the fuel lines and fire suppression system were likely damaged when the engine exploded. The engine’s outer casing is not strong enough to contain an engine explosion by design, based on the weight and cost of providing that strength.

The NTSB investigation is examining airplane parts and the flight data and cockpit voice recorders in order to provide a full accounting of what happened in the incident. Once these results are known, it will be determined whether this is considered an anomaly or whether changes to all planes using a similar design and configuration need to take action to prevent against a similar event recurring.

To view the initial investigation information on a one-page downloadable PDF, please click “Download PDF” above.


Extensive Contingency Plans Prevent Loss of Pluto Mission

By ThinkReliability Staff

Beginning July 14, 2015, the New Horizons probe started sending photos of Pluto back to earth, much to the delight of the world (and social media).  The New Horizons probe was launched more than 9 years ago (on January 19, 2006) – so long ago that when it left, Pluto was still considered a planet. (It’s been downgraded to dwarf planet now.)  A mission that long isn’t without a few bumps in the road.  Most notably, just ten days before New Horizons’ Pluto flyby, mission control lost contact with the probe.

Loss of communication with the New Horizons probe while it was nearly 3 billion miles away could have resulted in the loss of the mission.  However, because of contingency and troubleshooting plans built in to the design of the probe and the mission, communication was able to be restored, and the New Horizons probe continued on to Pluto.

The potential loss of a mission is a near miss. Analyzing near misses can provide important information and improvements for future issues and response.  In this case, the mission goal is impacted by the potential loss of the mission (near miss).  The labor and time goal are impacted by the time for response and repair.  Because of the distance between mission control on earth and the probe on its way to Pluto, the time required for troubleshooting was considerable owing mainly to the delay in communications that had to travel nearly 3 billion miles (a 9-hour round trip).

The potential loss of the mission was caused by the loss of communication between mission control and the probe.  Details on the error have not been released, but its description as a “hard to detect” error implies that it wasn’t noticed in testing prior to launch.  Because the particular command sequence that led to the loss of communication was not being repeated in the mission, once communication was restored there was no concern for a repeat of this issue.

Not all causes are negative.  In this case, the “loss of mission” became a “potential loss of mission” because communication with the probe was able to be restored.  This is due to the contingency and troubleshooting plans built in to the design of the mission.  After the error, the probe automatically switched to a backup computer, per contingency design.  Once communication was restored, the spacecraft automatically transmits data back to mission control to aid in troubleshooting.

Of the mission, Alice Bowman, the Missions Operation Manager says, “There’s nothing we could do but trust we’d prepared it well to set off on its journey on its own.”  Clearly, they did.

Trading Suspended on the NYSE for More Than 3 Hours

By ThinkReliability Staff

On July 8, 2015, trading was suspended on the New York Stock Exchange (NYSE) at 11:32 AM. According to the NYSE president Tom Farley, “the root cause was determined to be a configuration issue.” This still leaves many questions unanswered. This issue can be examined in a Cause Map, a visual form of root cause analysis.

There are three steps to the Cause Mapping problem-solving method. First, the problem is defined with respect to the impact to the goals. The basic problem information is captured – the what, when, and where. In a case such as this, where the problem unfolded over hours, a timeline can be useful to provide an overview of the incident. Problems with the NYSE began when a system upgrade to meet timestamp requirements began on the evening of July 7. As traders attempted to connect to the system early the next morning, communication issues were found and worsened until the NYSE suspended trading. The system was restarted and full trading resumed at 3:10 PM.

The impacts to the goals are also documented as part of the basic problem information. In this case, there were no impacts to safety or the environment as a result of this issue. Additionally, there was no impact to customers, whose trades automatically shifted to other exchanges. However, an investigation by the Securities & Exchange Commission (SEC) and political hearings are expected as a result of the outage, impacting the regulatory goal. The outage itself is an impact to the production goal, and the time spent on response and repairs is an impact to the labor/time goal.

The cause-and-effect relationships that led to these impacts to the goals can be developed by asking “why” questions. This can be done even for positive impacts to the goals. For example, in this case customer service was NOT impacted adversely because customers were able to continue making trades even through the NYSE outage. This occurred because there are 13 exchanges, and current technology automatically transfers the trades to other exchanges. Because of this, the outage was nearly transparent to the general public.

In the case of the outage itself, as discussed above, the NYSE has stated it was due to a configuration issue. Specifically, the gateways were not loaded with the proper configuration for the outage that was rolled out July 7. However, information about what exactly the configuration issue was or what checks failed to result in the improper configuration being loaded is not currently available. (Although some have said that the chance of this failure happening on the same date as two other large-scale outages could not be coincidental, the NYSE and government have ruled out hacking.) According to NYSE president Tom Farley, “We found what was wrong and we fixed what was wrong and we have no evidence whatsoever to suspect that it was external. Tonight and overnight starts the investigation of what exactly we need to change. Do we need to change those protocols? Absolutely. Exactly what those changes are I’m not prepared to say.”

Another concern is the backup plan in place for these types of issues. Says Harvey Pitt, SEC Chairman 2001 to 2003, “This kind of stuff is inevitable. But if it’s inevitable, that means you can plan for it. What confidence are we going to have that this isn’t going to happen anymore, or that what did happen was handled as good as anyone could have expected?” The backup plan in place appeared to be shifting operations to a disaster recovery center. This was not done because it was felt that requiring traders to reconnect would be disruptive. Other backup plans (if any) were not discussed. This has led some to question the oversight role of the SEC and its ability to prevent issues like this from recurring.

To view the investigation file, including the problem outline, Cause Map, and timeline, click on “Download PDF” above. To view the NYSE statement on the outage, click here.

Small goldfish can grow into a large problem in the wild

By Kim Smiley

Believe it or not, the unassuming goldfish can cause big problems when released into the wild.  I personally would have assumed that a goldfish set loose into the environment would quickly become a light snack for a native species, but invasive goldfish have managed to survive and thrive in lakes and ponds throughout the world.  Goldfish will keep growing as long as the environment they are in supports it.  So while goldfish kept in an aquarium will generally remain small, without the constraints of a tank, goldfish the size of dinner plates are not uncommon in the wild. These large goldfish both compete with and prey on native species, dramatically impacting native fish populations.

This issue can be better understood by building a Cause Map, a visual format of root cause analysis, which intuitively lays out the cause-and-effect relationships that contributed to the problem.  A Cause Map is built by asking “why” questions and recording the answers as a box on the Cause Map.  So why are invasive goldfish causing problems?  The problems are occurring because there are large populations of goldfish in the wild AND the goldfish are reducing native fish populations.  When there are two causes needed to produce an effect like in this case, both causes are listed on the Cause Map vertically and separated by an “and”.   Keep asking “why” questions to continue building the Cause Map.

So why are there large populations of goldfish in the wild?  Goldfish are being introduced to the wild by pet owners who no longer want to care for them and don’t want to kill their fish.  The owners likely don’t understand the potential environmental impacts of dumping non-native fish into their local lakes and ponds.  Goldfish are also hardy and some may survive being flushed down a toilet and end up happily living in a lake if a pet owner chooses to try that method of fish disposal.

Why do goldfish have such a large impact on native species?  Goldfish can grow larger than many native species and they compete with them for the same food sources.  In addition, goldfish eat small fish as well as eggs from native species.  Invasive goldfish can also introduce new diseases into bodies of water that can spread to the native species.  The presence of a large number of goldfish can also change the environment in a body of water.  Goldfish stir up mud and other matter when they feed which causes the water to be cloudier, impacting aquatic plants.  Some scientists also believe that large populations of goldfish can lead to algae blooms because goldfish feces is a potential food source for them.

Scientists are working to develop the most effective methods to deal with the invasive goldfish.  In some cases, officials may drain a lake or use electroshocking to remove the goldfish.  As an individual, you can help the problem by refraining from releasing pet fish into the wild.  It’s an understandable impulse to want to free an unwanted pet, but the consequences can be much larger than might be expected. You can contact local pet stores if you need to get rid of aquarium fish; some will allow you to return the fish.

To view a Cause Map of this problem, click on “Download PDF” above.

Indian Point Fire and Oil Leak

By Sarah Wrenn

At 5:50 PM on May 9, 2015, a fire ignited in one of two main transformers for the Unit 3 Reactor at Indian Point Energy Center. These transformers carry electricity from the main generator to the electrical grid. While the transformer is part of an electrical system external to the nuclear system, the reactor is designed to automatically shut down following a transformer failure. This system functioned as designed and the reactor remains shut down with the ongoing investigation. Concurrently, oil (dielectric fluid) spilled from the damaged transformer into the plant’s discharge canal and some amount was also released into the Hudson River. On May 19, Fred Dacimo, vice president for license renewal at Indian Point and Bill Mohl, president of Entergy Wholesale Commodities, stated the transformer holds more than 24,000 gallons of dielectric fluid. Inspections after the fire revealed 8,300 gallons have been collected or were combusted during the fire. As a result, investigators are working to identify the remaining 16,000 gallons of oil. Based on estimates from the Coast Guard supported by NOAA, up to approximately 3,000 gallons may have gone into the Hudson River.

The graphic located here provides details regarding the event, facility layout and response.

Step 1. Define the Problem

There are a few problems in this event. Certainly, the transformer failure and fire are major problems. The transformer is an integral component to transfer electricity from the power plant to the grid. Without the transformer, production has been halted. In addition, there is an inherent risk of injury with the fire response. The site’s fire brigade was dispatched to respond to the fire and while there were no injuries, there was a potential for injury. In addition, the release of dielectric fluid and fire-retardant foam into the Hudson River is a problem. A moat around the transformer is designed to contain these fluids if released, but evidence shows that some amounts reached the Hudson River.

As shown in the timeline and noted on our problem outline, the transformer failure and fire occurred at 5:50 PM and was officially declared out 2.25 hours later.

As far as anything out of the ordinary or unusual when this event occurred, Unit 3 had just returned to operations after a shutdown on May 7 to repair a leak of clean steam from a pipe on the non-nuclear side of the plant. Also, it was noted that this is the 3rd transformer failure in the past 8 years. This frequency of transformer failures is considered unusual. The Wall Street Journal reported that the transformer that failed earlier this month replaced another transformer that malfunctioned and caught fire in 2007. Another transformer failed in 2010, which had been in operation for four years.

Multiple organizational goals were negatively impacted by this event. As mentioned above, there was a risk of injury related to the fire response. There was also a negative impact to the environment due to the release of dielectric fluid and fire-retardant foam. The negative publicity from the event impacts the organization’s customer service goal. A notification to the NRC of an Unusual Event (the lowest of 4 NRC emergency classifications) is a regulatory impact. For production/schedule, Unit 3 was shutdown May 9 and remains shutdown during the investigation. There was a loss of the transformer which needs to be replaced. Finally, there is labor/time required to address and contain the release, repair the transformer, and investigate the incident.

Step 2. Identify the Causes (Analysis)

Now that we’ve defined the problem in relation to how the organization’s goals were negatively impacted, we want to understand why.

The Safety Goal was impacted due to the potential for injury. The risk of injury exists because of the transformer fire.



The Regulatory Goal was impacted due to the notification to the NRC. This was because of the Unit 3 shutdown, which also impacts the Production/Schedule Goal. Unit 3 shutdown as this is the designed response to the emergency. This is the designed response because of the loss of the electrical transformer, which also impacts the Property/Equipment Goal. Why was the electrical transformer lost? Because of the transformer fire.

For the other goals impacted, Customer Service was because of the negative publicity which was caused by the containment, repair, investigation time and effort. This time and effort impacts the organization’s Labor/Time Goal. This time and effort was required because of the dielectric fluid and fire-retardant foam release. Why was there a release? Because the fluid and foam were able to access the river.

Why did the fluid and foam access the river?

The fire-retardant foam was introduced because the sprinkler system was ineffective. The transformer is located outside in the transformer yard which is equipped with a sprinkler system. Reports indicate that the fire was originally extinguished by the sprinklers, but then relit. Fire responders introduced fire-retardant foam and water to more aggressively address the fire. Some questions we would ask here include why was the sprinkler system ineffective at completely controlling the fire? Alternatively, is the sprinkler system designed to begin controlling the fire as an immediate response such that the fire brigade has time to respond? If this is the case, then did the sprinkler perform as expected and designed?

The transformer moat is designed to catch fluids and was unable to contain the fluid and the foam. When a containment is unable to hold the amount of fluid that is introduced, this means that either there is a leak in the containment or the amount of fluid introduced is greater than the capacity of the containment. We want to investigate the integrity of the containment and if there are any leak paths that would have allowed fluids to escape the moat. We also want to understand the volume of fluid that was introduced. The moat is capable of holding up to 89,000 gallons of fluid. A transformer contains approximately 24,000 gallons of dielectric fluid. What we don’t know is how much fire-retardant foam was introduced. If this value plus the amount of transformer fluid is greater than the capacity of the moat, then the fluid will overflow and can access the river. If this is the case, we also would want to understand if the moat capacity is sufficient, should it be larger? Also, is the moat designed such that an overflow will result in accessing the discharge canal and is this desired?

Finally, dielectric fluid accessed the river because the fluid was released from the transformer. Questions we would ask here are: Why was the fluid released and why does a transformer contain dielectric fluid? Dielectric fluid is used to cool the transformers. Other cooling methods, such as fans are also in place. The causes of the fluid release and transformer failure is still being investigated, but in addition to determining these causes, we would also ask how are the transformers monitored and maintained? The Wall Street Journal provided a statement from Jerry Nappi, a spokesman for Entergy. Nappi said both of unit 3’s transformers passed extensive electrical inspections in March. Transformers at Indian Point get these intensive inspections every two years. Aspects of the devices also are inspected daily.

Finally, we want to understand why was there a transformer fire. The transformer fire occurred because there was some heat source (ignition source), fuel, and oxygen. We want to investigate what was the heat source – was there a spark, a short in the wiring, a static electricity build up? Also, where did the fuel come from and is it expected to be there? The dielectric fluid is flammable, but are there other fuel sources that exist?

Step 3. Select the Best Solutions (Reduce the Risk)

What can be done? With the investigation ongoing, a lot of facts still need to be gathered to complete the analysis. Once that information is gathered, we want to consider what is possible to reduce the risk of having this type of event occur in the future. We would want to evaluate what can be done to address the transformer, implementing solutions to better maintain, monitor, and/or operate it. Focusing on solutions that will minimize the risk of failure and fire. However, if a failure does occur, we want to consider solutions so that the failure and fire does not result in a release. Further, we can consider the immediate response; do these steps adequately contain the release? Identifying specific solutions to the causes identified will provide reductions to the risk of future similar events.


This Cause Map was built using publicly available information from the following resources.

De Avila, Joseph “New York State Calls for Tougher Inspections at Indian Point” Published 5/20/2015. Accessed 5/20/2015

“Entergy’s Response to the Transformer Failure at Indian Point Energy Center” Accessed 5/19/2015

“Entergy Plans Maintenance Shutdown of Indian Point Unit 3” Published 5/7/2015. Accessed 5/19/2015

“Indian Point Unit 3 Safely Shutdown Following Failure of Transformer” Published 5/9/2015. Accessed 5/19/2015

“Entergy Leading Response to Monitor and Mitigate Potential Impacts to Hudson River Following Transformer Failure at Indian Point Energy Center” Published 5/13/2015. Accessed 5/19/2015

“Entergy Continues Investigation of Failed Transformer, Spilled Dielectric Fluid at Indian Point Energy Center” Published 5/15/2015. Accessed 5/19/2015

McGeehan, Patrick “Fire Prompts Renewed Calls to Close the Indian Point Nuclear Plant” Published 5/12/2015. Accessed 5/19/2015

Screnci, Diane. “Indian Point Transformer Fire” Accessed 5/19/2015

Crash of Germanwings flight 95252 Leads to Questions

By ThinkReliability Staff

On March 24, 2015, Germanwings flight 9525 crashed into the French Alps, killing all 150 onboard. Evidence available thus far suggests the copilot deliberately locked the pilot out of the cockpit and intentionally crashed the plane. While evidence collection is ongoing, because of the magnitude of this catastrophe, solutions to prevent similar recurrences are already being discussed and, in some cases, implemented.

What is known about the crash can be captured in a Cause Map, or visual form of root cause analysis. Visually diagramming all the cause-and-effect relationships allows the potential for addressing all related causes, leading to a larger number of potential solutions. The analysis begins by capturing the impacted goals in the problem outline. In this case, the loss of 150 lives (everybody aboard the plane) is an impact to the safety goal and of primary concern in the investigation. Also impacted are the property goal due to the loss of the plane, and the recovery and investigation efforts (which are particularly difficult in this case due to the difficult-to-access location of the crash.)

Asking “Why” questions from the impacted goals develops cause-and-effect relationships. In this case, the deaths resulted from the crash of the plane into the mountains of the French Alps. So far, available information appears to support the theory that the copilot deliberately crashed the plane. Audio recordings of the pilot requesting re-entry into the cockpit, the normal breathing of the co-pilot, and the manual increase of speed of the descent while crash warnings sounded all suggest that the crash was deliberate. Questions have been raised about the co-pilot’s fitness for duty. Some have suggested increased psychological testing for pilots, but the agency Airlines for America says that the current system (at least in the US), is working: “All airlines can and do conduct fitness-for-duty testing on pilots if warranted. As evidenced by our safety record, the U.S. airline industry remains the largest and safest aviation system in the world as a result of the ongoing and strong collaboration among airlines, airline employees, manufacturers and government.”

Some think that technology is the answer. The flight voice recorder captured cockpit alarms indicating an impending crash. But these were simply ignored by the co-pilot. If flight guidance software was able to take over for an incapacitated pilot (or one who deliberately ignores these warnings, disasters like this one could be avoided. Former Department of Transportation Inspector General Mary Schiavo says, “This technology, I believe, would have saved the flight. Not only would it have saved this flight and the Germanwings passengers, it would also save lives in situations where it is not a suicidal, homicidal pilot. It has implications literally for safer flight across the industry.”

Others say cockpit procedures should be able to prevent an issue like this. According to aviation lawyers Brian Alexander & Justin Green, in a blog for CNN, “If Germanwings had implemented a procedure to require a second person in the cockpit at all times – a rule that many other airlines followed – he would not have been able to lock the pilot out.”

After 9/11, cockpit doors were reinforced to prevent any forced entry (according to the Federal Aviation Administration, they should be strong enough to withstand a grenade blast). The doors have 3 settings – unlock, normal, and lock. Under normal settings, the cockpit can be unlocked by crewmembers with a code after a delay. But under the lock setting (to be used, for example, to prevent hijackers who have obtained the crew code from entering the cockpit), no codes will allow access. (The lock setting has to be reset every 5 minutes.) Because of the possibility a rogue crewmember could lock out all other crewmembers, US airlines instituted the rule that there must always be two people in the cockpit. (Of course, if only a three-person crew is present, this can cause other issues, such as when a pilot became locked in the bathroom while the only other two flight crew members onboard were locked in the cockpit, nearly resulting in a terror alert. See our previous blog on this issue.)

James Hall, the former chairman of the National Transportation Safety Board, agrees. He says, “The flight deck is capable of accommodating three pilots and there shouldn’t ever be a situation where there is only one person in the cockpit.” In response, many airlines in Europe and Canada, including Germanwings’ parent company Lufthansa, have since instituted a rule requiring at least two people in the cockpit at all times.   Other changes to increase airline safety may be implemented after more details regarding the crash are discovered.

March 27, 1977: Two Jets Collide on Runway, Killing 583

By ThinkReliability Staff

March 27, 1977 was a difficult day for the aviation industry.  Just after noon, a bomb exploded at the Las Palmas passenger terminal in the Canary Islands.  Five large passenger planes were diverted to the Tenerife-Norte Los Rodeos Airport, where they completely covered the taxiway of the one-runway regional airport.  Less than five hours later, when the planes were finally given permission to takeoff, two collided on the runway, killing 583, making this the worst accident at the time (and second now only to the September 11, 2001 attacks in the US.)

With the benefit of nearly 40 years of hindsight, it is possible to review the causes of the accident, as well as look at the solutions implemented after this accident, which are still being used in the aviation industry today.  First we look at the impact to the goals as a result of this tragedy.  The deaths of 583 people (out of a total of 644 on both planes) are an impact to the safety goal.  The compensation to families of the victims (paid by the operating company of one of the planes) is an impact to the customer service goal.  The property goal was impacted due to the destruction of both the planes, and the labor goal was impacted by the rescue, response, and investigation costs that resulted from the accident.

Beginning with one of the impacted goals, we can ask why questions to diagram the cause-and-effect relationships related to the incident.  The deaths of the 583 people onboard were due to the runway collision of two planes.  The collision occurred when one plane was taking off on the runway, and the other was taxiing to takeoff position on the same runway (called backtracking).

Backtracking is not common (most airports have separate runways and taxiways), but was necessary in this case because the taxiway was unavailable for taxiing.  The taxiway was blocked by the three other large planes parked at the airport.  A total of five planes were diverted to Tenerife which, having only one runway and a parallel taxiway, was not built to accommodate this number of planes.  There were four turnoffs from the runway to the taxiway; the taxiing plane had been instructed to turn off at the third turn (the first turn that was not blocked by other planes).  For unknown reasons, it did not, and the collision resulted between the third and fourth turnoff.  (Experts disagree on whether the plane would have been able to successfully make the sharp turn at the third turnoff.)

One plane was attempting takeoff, when it ran into the second plane on the runway.  The plane  taking  off was unaware of the presence of the taxiing plane.  There was no ground radar and the airport was under heavy fog cover, so the control tower was relying on positions reported by radio.  At the time the taxiing plane reported its position, the first plane was discussing takeoff plans with the control tower, resulting in interference rendering most of the conversation inaudible.  The pilot of the plane taking off believed he had clearance, due to confusing communication between the plane and the air traffic control tower.  Not only did the flight crews and control tower speak different languages, the word “takeoff” was used during a conversation that was not intended to provide clearance for takeoff.  Based on discussions between the pilot and flight crew on the plane taking off have, investigators believed, but were not able to definitively determine, that other crew members may have questioned the clearance for takeoff, but not to the extent that the pilot asked the control tower for clarification or delayed the takeoff.

After the tragedy, the airport was upgraded to include ground radar.  Solutions that impacted the entire aviation industry included the use of English as the official control language (to be used when communicating between aircraft and control towers) and also prohibited the use of the word “takeoff” unless approving or revoking takeoff clearance.  The potential that action by one of the other crew members could have saved the flights aided in the concept of Crew Resource Management, to ensure that all flight crew members could and would speak up when they had questions related to the safety of the plane.

Though this is by far the runway collision with the greatest impact to human life, runway collisions are still a concern.  In 2011, an Airbus A380 clipped the wing of a Bombardier CRJ (see our previous blog).  Officials at Los Angeles International Airport (LAX) experienced 21 runway incursions in 2007, after which they redesigned the runways and taxiways so that they wouldn’t intersect, and installed radar-equipped warning lights to provide planes with a visual warning of potential collisions (see our previous blog).

To view the outline, Cause Map and recommended solutions from the Tenerife runway collision of 1977, click on “Download PDF” above.  Or, click here to read more.

Working Conditions Raise Concerns at Fukushima Daiichi

By ThinkReliability Staff

The nearly 7,000 workers toiling to decommission the reactors at Fukushima Daiichi after they were destroyed by the earthquake and tsunami on March 11, 2011 face a daunting task (described in our previous blog). Recent events have led to questions about the working conditions and safety of these workers.

On January 16, 2015, the local labor bureau instructed the utility that owns the plants to reduce industrial accidents. (The site experienced 23 accidents in fiscal year 2013 and 55 so far this fiscal year.) Three days later, on January 19, a worker fell into a water storage tank and was taken to the hospital. He died the next day, as did a worker at Fukushima Daini when his head got caught in machinery. (Fukushima Daini is nearby and was less impacted by the 2011 event. It is now being used as a staging site for the decommissioning work at Fukushima Daiichi.)

Although looking at all industrial accidents will provide the most effective solutions, often digging into just one in greater detail will provide a starting point for site improvements. In this case, we will look at the January 19 fall at Fukushima Daiichi to identify some of the challenges facing the site that may be leading to worker injuries and fatalities.

A Cause Map, or visual form of root cause analysis, is begun by determining the organizational impacts as a result of an incident. In this case the worker fall impacted the safety goal due to the death of the worker. The environmental goal was not impacted. (Although the radiation levels at the site still require extensive personal protective equipment, the incident was not radiation-related.) Workers on site have noted difficult working conditions, which are thought to be at least partially responsible for the rise in incidents, as are the huge number of workers at the site (itself an impact to the labor/time goal). Lastly, local organizations have raised regulatory concerns due to the high number of incidents at the site.

An analysis of the issues begins with one impacted goal. In this case, the worker death resulted from a fall into a ten-meter empty tank. The worker was apparently not found immediately (though specific timeline details and whether or not that impacted the worker’s outcome have not been released) because it appears he was working alone, likely due to the massive manpower needs at the site. Additionally, the face masks worn by all workers (due to the high radiation levels still present) limit visibility.

The worker was checking for leaks at the top of the tank, which is being used to store water used to cool the reactors at the site. There is a general concern about lack of knowledge of workers (many of whom have been hired recently with little or no experience doing the types of tasks they are now performing), though again, it’s unclear whether this was applicable in this case. Of more concern is the ineffective safety equipment – apparently the worker did not securely fasten his safety harness.

The reasons for this, and the worker falling in the first place, are likely due to worker fatigue or lack of concentration. Workers at the site face difficult conditions doing difficult work all day (or night) long, and have to travel far afterwards, as the surrounding area is still evacuated. Reports of mental health issues and fatigue in these workers has led to the opening of a new site providing meals and rest for these workers.

These factors are likely contributing to the increase in accidents, as is the number of workers at the site, which doubled from December 2013 to December 2014. Local organizations are still calling for action to reduce these actions. “It’s not just the number of accidents that has been on the rise. It’s the serious cases, including deaths and serious injuries that have risen, so we asked Tokyo Electric to improve the situation,” says Katsuyoshi Ito, a local labor standards inspector.

In addition to improving working conditions, the site is implementing improved worker training – and looking at discharging wastewater instead of storing it, which would reduce the pieces of equipment required to be monitored and maintained. Improvements must be made, because decades of work remains before work at the site will be completed.

Click here to sign up for our FREE webinar “Root Cause Analysis Case Study: Fukushima Daiichi” at 2:00 pm EDT on March 12 to learn more about how the earthquake and tsunami on March 11, 2011 impacted the plant.