Small fire leads to thousands of canceled flights

By Kim Smiley

Starting August 8, 2016, thousands of travelers were stranded worldwide after widespread cancelations and delays of Delta Air Lines flights. The disruptions continued over several days and the impacts lingered even longer.  The flight issues made headlines around the globe and the financial impact to the company was significant.

So what happened? What caused this massive headache to so many travelers? The short answer is a small fire in an airline data center, but a much longer answer is needed to understand what caused this incident. A Cause Map, a visual format for performing a root cause analysis, can be used to analyze this issue. All of the causes that contributed to an issue are visually laid out to intuitively show cause-and-effect relationships in a Cause Map.  The Cause Map is built by asking “why” questions and adding the answers.  For an effect with more than one cause, all of the causes that contributed to the effect are listed vertically and separated by an “and”.  (Click on “Download PDF” to see an intermediate level Cause Map of this incident.)

So why were so many flights canceled and delayed? There was a system-wide computer outage and the airline depends on computer systems for everything from processing check-ins to assigning crews and gates.  Bottom line, no flights leave on time without working computer systems.  The issues originated at a single data center, but the design of the system led to cascading computer issues that impacted systems worldwide.  The airline has not released any specific details about why exactly the issue spread, but this is certainly an area investigators would want to understand in order to create a solution to prevent a similar cascading failure in the future.

In a statement, the company indicated that an electrical component failed, causing a small fire at the data center. (Again, the specifics about what type of component and what caused the failure haven’t been released.) The fire caused a transformer to shut down which resulted in a loss of primary power to the data center.  A secondary power system did kick on, but not all servers were connected to backup power.  No details have been released about why some servers were not powered by the secondary power supply.

Compounding the frustration for the impacted travelers is the fact that they were unable to get updated flight information. Flight status systems, including airport monitors, continued to show that all flights were on time during the period of the cancelations and delays.

Once a large number of flights are disrupted, it is difficult to return to a normal flight schedule.  The rotation schedule for airlines and pilots has to be redone, which can be time-consuming.  Many commercial flights operate near capacity so it can be difficult to find seats for all the passengers impacted by canceled and delayed flights.  Delta has tried to compensate travelers impacted by this incident by offering refunds and $200 in travel vouchers to people whose flights were canceled or delayed at least three hours, but an incident of this magnitude will naturally impact customer confidence in the company.

This incident is a good reminder of the importance of building robust systems with functional backups; otherwise a small problem can spread and quickly become a big problem.

The Solution to America’s Most Unexpectedly Dangerous Mammal

By ThinkReliability Staff

It’s hard to imagine that the mammal responsible for over 200 human deaths in America each year is the cute, cuddly…. deer.  These beautiful and seemingly harmless animals are hardly malicious.  Instead, they are in the wrong place at the wrong time, resulting in more than one million deer / vehicle collisions each year.  While the drivers have partial responsibility in these collisions, it seems that changes in the food chain have also contributed to this situation.   

In the 1800s, cougars (also called pumas or mountain lions) could be found roaming across the United States and Canada.  However, beginning in the early 1900s, states began implementing bounty programs enticing hunters to kill cougars.  The goal was to protect livestock and humans from these seemingly dangerous animals.  By the 1950s, the cougar population was primarily limited to areas west of the Rocky Mountains.  As the food chain predicts, the absence of a predator resulted in the overpopulation of its prey.  As the deer population increased, the probability for deer / vehicle collisions also increased.  

Expensive solutions have been considered to help decrease the collision rate, including deer culling, contraception and highway crossings.  However, it seems that nature may now be working towards its own natural solution.  As the bounty programs were removed in the 1960s and 1970s, the cougars have slowly begun migrating back towards the east.  A recent study published in Conservation Letters suggests that repopulation of cougars in the Eastern portion of the US could prevent 708,600 deer / vehicle collisions and 155 deaths over the next 30 years.   (The original fear of cougars attacking humans seems unfounded.  According to The Cougar Network, “Cougars are a retreating animal and very wary of people. Within the United States and Canada since 1890, there have been less than 100 attacks on humans, with about 20 fatalities. Encountering a cougar, let alone being attacked, is incredibly rare.”) 

A Cause Map is a helpful tool to dissect the cause-and-effect relationships contributing to a problem or situation.   Starting with the goals that were impacted, the causes and effects can be linked to create a chain.   For this situation, we begin with the safety goal that is impacted by the many fatalities each year.  Asking ‘Why’ questions, we can dig deeper to understand what causes are behind the impacted goal.   

In this case, the fatalities are a result of car collisions with deer.  The collisions are due to two factors: the deer unexpectedly crossing the road and the fact that the driver didn’t see the deer in time to stop.  We can trace each of these causes one at a time, revealing more causes.  The deer unexpectedly crosses the road because deer are moving to new areas.  This is because deer are overcrowded and need to expand their habitat.  The overcrowding is due to the growing deer population, which is due to the decrease in natural deer predators.  This decrease is caused by the decline in the cougar population, which is a result of the bounty programs that were implemented in the early 1900s.  These bounty programs were motivated by fear that the cougars would endanger humans or livestock.   

Going back to the driver’s role in the situation, we see that the driver may not have seen the deer in time due to poor lighting because deer often travel at dawn or dusk, and the driver may not have been paying close enough attention perhaps because they were distracted.   A second goal, property, was also impacted in this situation because the vehicles are damaged or destroyed as a result of the collisions.   

The Cause Map is also helpful in that it allows us to document the evidence and potential solutions directly on the causes that they can impact.   For example, the statistics about the number of collisions each year, fatalities each year, and cougar population changes are included right below the causes that they support.   Similarly, possible solutions are added right above the causes that they can impact.  In this case, deer culling and contraception could help control the deer overcrowding, and special deer highway crossings could help mitigate the deer crossing the road unexpectedly.  However, nature’s solution seems to fit further back in the chain by impacting the cause that is the decrease in the cougar population.   Time will tell if this solution will, in fact, reduce the number of collisions and injuries as predicted. 

To view the initial Cause Map of this issue, click on “Download PDF” above.