Tag Archives: root cause analysis

Not all McDonald’s franchise owners “lovin” the new menu

October 23, 2015 Kim Smiley

Are you “lovin’ it” now that McDonald’s offers breakfast all day? If so, you are not alone because McDonald’s has stated that extended breakfast hours had been the number one request by customers. After recent declines in sales, McDonald’s is hoping that all-day breakfast will boost profits, but some franchise owners are concerned that extending breakfast hours will actually end up hurting their businesses.

Offering breakfast during the day is not as simple as it may sound because McDonald’s are now required to offer breakfast in addition to their regular fare. Cooking only hash browns in the fryers is inherently simpler than figuring out how to cook both hash browns and fries at the same time. Basically, attempting to prepare breakfast simultaneously with traditional lunch and dinner items creates a more complicated workflow in the kitchen. Complication generally slows things down, which can be a major problem for a fast food restaurant.

If customers get annoyed at increased wait times, they may choose to visit one of the many other fast food restaurants, rather than McDonald’s, for their next meal out. Many franchisees are investing in more kitchen equipment and increasing staffing to support extended breakfast hours, both of which can quickly eat into the button line. Increased profits from offering all-day breakfast will need to balance out the cost required to support it or franchise owners will lose money.

Franchise owners have also expressed concern that customers may spend less money now that breakfast is an option after 11 am. Breakfast items in general are less expensive than other fare and if customers choose to order an egg-based sandwich for lunch rather than a more expensive hamburger it could potentially cut into profits. It all depends on the profit margin on each individual menu item, but restaurants need to make sure they aren’t offering items that will compete with their more profitable offerings.

The changing menu also has the potential to frustrate customers (and frustrated customers will generally find somewhere else to buy their next lunch). The addition of all-day breakfast has resulted in menu changes at many McDonald’s and more menu variability between franchises. The larger the menu offered the more difficult it is to create cheap food quickly so some less popular items like wraps have been cut at many McDonald’s locations to make room for breakfast. If you are a person who loves wraps and doesn’t really want an egg muffin, this move is pretty annoying. The other potential problem is that most McDonald’s are only offering either the English muffin-based sandwiches or the biscuit-based sandwiches (but not both) after the traditional breakfast window. So depending on the McDonald’s, you may be all fired up for an all-day breakfast Egg McMuffin to be told that you still need to get there before 10:30 am to order one since about 20 percent of McDonald’s have chosen to go with biscuit-based breakfast sandwiches instead.

There are multiple issues that need to be considered to really understand the impacts of switching to all-day breakfast. Even seemingly simple “problems” like this can quickly get complicated when you start digging into the details. A Cause Map, a visual root cause analysis, can be used to intuitively lay out the potential issues from adding all-day breakfast to menus at McDonald’s. A Cause Map develops cause-and-effect relationships so that the problem can be better understood. To view a Cause Map for this example, click on “Download PDF” above.

Studies have found that at least one quarter of American adults eat fast food everyday (which could be its own Cause Map…) so there are a lot of dollars being spent at McDonald’s and its competitors. Only time will tell if all-day breakfast will help McDonald’s gobble up a bigger market share of the fast food pie, but fast food restaurants will certainly continue trying to outdo each other as long as demand remains high.

Uncategorized

Invasive Pythons Decimating Native Species in the Everglades

October 15, 2015 Kim Smiley

By Kim Smiley

Have you ever dreamed of hunting pythons? If so, Florida is hosting the month-long 2016 Python Challenge and all you need to do to join in is to pay a $25 application fee and pass an online test to prove that you can distinguish between invasive pythons and native snake species.

The idea behind the python hunt is to reduce the population of Burmese pythons in the Florida Everglades. As the number of pythons has increased, there has been a pronounced decline in native species’ populations, including several endangered species. Researchers have found that 99% of raccoons and opossums have vanished along with 88% of bobcats, along with declines in nearly every other species. Pythons are indiscriminate eaters and consume everything from small birds to full-grown deer. The sheer number of these invasive snakes in the Florida Everglades is having a huge environmental impact.

The exact details of how pythons were released into the Everglades aren’t known, but genetic testing has confirmed that the population originated from pet snakes that were either released or escaped into the wild. Once the pythons were introduced into the Everglades, their number quickly grew as the python population thrived. The first Burmese python was found in the Florida Everglades in 1979 and now there are estimated to be as many as 100,000 of the snakes in the area.

There are many factors that have led to the rapid growth in the python population. They are able to live in the temperate Florida climate, have plentiful food available, and are successfully reproducing. Pythons produce a relatively large number of eggs (an average of 40 eggs about every 2 years) and the large female python protects them. Hatchling pythons are also larger than most hatchling snakes, which increases their chance of surviving into adulthood. There are very few animals that prey on adult pythons. Researchers have found that alligators occasionally eat pythons, but that the relationship between these two top predators can go both ways and pythons have occasionally eaten alligators up to 6 feet in length. The only other real predators capable of taking down a python are humans and even that is a challenge.

Before a python can be hunted, it has to be found and that is often much easier said than done. Pythons have excellent camouflage and are ambush predators that naturally spend a large percentage of the day hiding. They also are semi-aquatic and excellent climbers so they can be found in both the water and in trees. Despite their massive size (they can grow as long as 20 feet and weigh up to 200 pounds), they blend in so well with the environment that researchers even have difficulty finding snakes with radio transmitters showing their locations.

The last python challenge was held about 3 years ago and 68 snakes were caught. While that number may not sound large, it is more snakes than have been caught in any other month. The contest also helped increase public awareness of the issue and hopefully discouraged any additional release of pets of any variety into the wild. For the 2016 contest, officials are hoping to improve the outcome by offering prospective hunters on-site training with a guide who will educate them on swamps and show them areas where snakes are most likely to be found.

To view a Cause Map, a visual root cause analysis format, of this issue click on “Download PDF” above. A Cause Map intuitively lays out the cause-and-effect relationships that contributed to the problem.

You can check out some of our previous blogs to view more Cause Maps for invasive species if you want to learn more:

Small goldfish can grow into a large problem in the wild

Plan to Control Invasive Snakes with Drop of Dead Mice

Root Cause Analysis - Incident Investigation

NTSB recommends increased oversight of DC Metro

October 9, 2015 Kim Smiley

By Kim Smiley

On September 30, 2015, the National Transportation Safety Board (NTSB) issued urgent safety recommendations calling for the Federal Railroad Administration to take over the task of overseeing the Washington, DC Metro system. The NTSB has determined that the body presently charged with overseeing it (the Tri-State Oversight Committee) doesn’t provide adequate independent safety oversight. Specifically, the Tri-State Oversight Committee doesn’t have the regulatory power to issue orders or levy fines and lacks enforcement authority.

The recommendations resulted from findings from the ongoing investigation into a smoke and electrical arcing accident in a Metro tunnel that killed one passenger and sent 86 others to the hospital. (To learn more, read our previous blog “Passengers trapped in smoke-filled metro train”.) The severity of damage done to the components involved in the arcing incident have made it difficult to identify exactly what caused the arcing to occur, but the investigation uncovered problems with other electrical connections in the system that could potentially lead to similar issues if not fixed.

Investigators found that some electrical connections are at risk of short circuiting because moisture and contaminants may get into them because they were improperly constructed and/or installed. The issues with the electrical components were not identified prior to this investigation which raises more questions about the Metro’s inspection and maintenance programs. Although the final report on the incident has not been completed, the NTSB issued recommendations in June to address these electrical short circuit hazards because they required “immediate action” to ensure safety.

Investigators have found other issues with the aging DC Metro system such as leaks allowing significant water into the tunnels, issues with inadequate ventilation and questions about the adequacy of staff training. The final report into the deadly arcing incident will include recommendations that go far beyond fixing one electrical issue on one run of track.

This example is a great illustration of how digging into the details of one specific problem will often reveal information about how to improve reliability across an organization. It may seem overwhelming to tackle organization-wide improvements, but often the best way to start is with an investigation into one issue and digging down into the details.

Root Cause Analysis - Incident Investigation

Volkswagen admits to use of a ‘defeat device’

September 25, 2015 Kim Smiley

By Kim Smiley

The automotive industry was recently rocked by Volkswagen’s acknowledgement that the company knowingly cheated on emissions testing of several models of 4-cylinder diesel cars starting in 2009. The diesel cars in question include software “defeat devices” that turn on full emissions control only during emissions testing. Full emissions control is not activated during normal driving conditions and the cars have been shown to emit as much as 40 times the allowable pollution. Customers are understandably outraged, especially since many of them purchased a “clean diesel” car in an effort to be greener.

The investigation into this issue is ongoing and many details aren’t known yet, but an initial Cause Map, a visual format for performing a root cause analysis, can be created to document and analyze what is known. The first step in the Cause Mapping process is to fill in a Problem Outline with the basic background information and how the issue impacts the overall organizational goals. The “defeat device” issue is a complex problem and impacts many different organizational goals. The increased emissions obviously impacts the environmental goal and the potential health impacts of those emissions is an impact to the safety goal. Some of the specific details are still unknown, like the exact amount of the fines the company will face, but we can safely assume the company will be paying significant fines (on the order of billions) as a result of this blatant violation of the law. The Volkswagen stock price also took a major hit and dropped more than 20 percent following the announcement of the diesel emissions issues. It is difficult to quantify how much the loss of consumer confidence will impact the company long-term, but being perceived as a dishonest company by many will certainly impact their sales. A large recall that will be both time-consuming and costly is also in Volkswagen’s future. Depending on the investigation findings, there is also the potential for criminal prosecution because of the intentional nature of this issue.

Once the overall impacts to the goals are defined, the actual Cause Map can be built by asking “why” questions. So why did these cars include “defeat devices” to cheat on emissions tests? The simple answer is increased profits. Designing cars that appeared to have much lower emissions than they did in reality allowed Volkswagen to market a car that was more desirable. Car design has always included a trade-off between emissions and performance. Detailed information hasn’t been released yet, but it is likely that the car had improved fuel economy and improved driving performance during normal driving conditions when full emissions control wasn’t activated. Whoever was involved in the design of the “defeat device” also likely assumed the deception would never be discovered, which raises concern about how emissions testing is performed.

The design of the “defeat device” is believed to work by taking advantage of unique conditions that exist during emissions testing. During normal driving, the steering column moves as the driver steers the car, but during emissions testing the wheels rotate, but the steering column doesn’t move. The “defeat device” software appears to have monitored the steering column and wheels to sense when the conditions indicated an emissions test was occurring. When the wheels turned without corresponding steering wheel motion, the software turned the catalytic scrubber up to full power, reducing emissions and allowing the car to pass emissions tests. Details on how the “defeat device” was developed and approved for inclusion in the design haven’t been released, but hopefully the investigation into this issue will be insightful and help understand exactly how something this over the line occurred.

Only time will tell exactly how this issue impacts the overall health of the Volkswagen company, but the short-term effects are likely to be severe. This issue may also have long-reaching impacts on the diesel market as consumer confidence in the technology is shaken.

To view an Outline and initial Cause Map of this issue, click on “Download PDF” above.

Uncategorized

Runway Fire Forces Evacuation of Airplane

September 17, 2015 ThinkReliability Staff

By ThinkReliability Staff

On September 8, 2015, an airplane caught fire during take-off from an airport in Las Vegas, Nevada. The pilot was able to stop the plane, reportedly in just 9 seconds after becoming aware of the fire. The crew then evacuated the 157 passengers, 27 of whom received minor injuries as a result of the evacuation by slide. Although the National Transportation Safety Board (NTSB) investigation is ongoing, information that is known, as well as potential causes that are under consideration, can be diagrammed in a Cause Map, or visual root cause analysis.

The first step of Cause Mapping is to define the problem by completing a problem outline. The problem outline captures the background information (what, when and where) of the problem, as well as the impact to the goals. In this case, the safety goal is impacted due to the passenger injuries. The evacuation of the airplane impacts the customer service goal. The NTSB investigation impacts the regulatory goal. The schedule goal is impacted by a temporary delay of flights in the area, and the property goal is impacted by the significant damage to the plane. The rescue, response and investigation is an impact to the labor goal.

The Cause Map is built by beginning with one of the impacted goals and asking “Why” questions to develop the cause-and-effect relationships that led to an issue. In this case, the injuries were due to evacuation by slide (primarily abrasions, though some sources also said there were some injuries from smoke inhalation). These injuries were caused by the evacuation of the airplane. The airplane was evacuated due to an extensive fire. Another cause leading to the evacuation was that take-off was aborted.

The fact that take-off was able to be aborted, for which the pilot has been hailed as a hero, is actually a positive cause. Had the take-off been unable to be aborted, the result would likely have been far worse. In the case of the Concorde accident, a piece of debris on the runway ruptured a tire, which caused damage to the fuel tank, leading to a fire after the point where take-off could be aborted. Instead, the aircraft stalled and crashed into a hotel, killing all onboard the craft and 4 in the hotel. The pilot’s ability to quickly save the plane almost certainly saved many lives.

The fire is thought to have been initiated by an explosion in the left engine due a catastrophic uncontained explosion of the high-pressure compressor. This assessment is based on the compressor fragments that were found on the runway. This likely resulted from either a bird strike (as happened in the case of US Airways flight 1549), or a strike from other debris on the runway (as occurred with the Concorde), or fatigue failure of the engine components due to age. This is the first uncontained failure of this type of engine, so some consider fatigue failure to be less likely. (Reports of an airworthiness directive after cracks were detected in weld joints of compressors were in engines with different parts and a different compressor configuration.)

In this incident, the fire was unable to be put out without assistance from responding firefighters. This is potentially due to an ongoing leak of fuel if fuel lines were ruptured and the failure of the airplane’s fire suppression system, which reportedly deployed but did not extinguish the fire. Both the fuel lines and fire suppression system were likely damaged when the engine exploded. The engine’s outer casing is not strong enough to contain an engine explosion by design, based on the weight and cost of providing that strength.

The NTSB investigation is examining airplane parts and the flight data and cockpit voice recorders in order to provide a full accounting of what happened in the incident. Once these results are known, it will be determined whether this is considered an anomaly or whether changes to all planes using a similar design and configuration need to take action to prevent against a similar event recurring.

To view the initial investigation information on a one-page downloadable PDF, please click “Download PDF” above.

Root Cause Analysis - Incident Investigation

Waste Released from Gold King Mine

September 8, 2015 Renata Martinez

By Renata Martinez

On August 5, 2015 over 3 million gallons of waste was released from Gold King Mine into Cement Creek which then flowed into the Animas River. The orangish colored plume moved over 100 miles downstream from Silverton, Colorado through Durango reaching the San Juan River in New Mexico and eventually making its way to Lake Powell in Utah (although the EPA stated that the leading edge of the plume was no longer visible by the time it reached Lake Powell a week after the release occurred).

Some of the impacts were immediate. No workers at the mine site were hurt in the incident but the collapse of the mine opening and release of water can be considered a near miss because there was potential for injuries. After the release, there were also potential health risks associated with the waste itself since it contained heavy metals.

Water sources along the river were impacted and there’s potential that local wells could be contaminated with the waste. To mitigate the impacts, irrigation ditches that fed crops and livestock were shut down. Additionally, the short-term impacts include closure of the Animas River for recreation (impacting tourism in Southwest Colorado) from August 5-14.

The long-term environmental impacts will be evaluated over time, but it appears that the waste may damage ecosystems in and along the plume’s path. There are ongoing investigations to assess the impact to wildlife and aquatic organisms, but so far the health effects from skin contact or incidental ingestion of contaminated river water are not considered significant.

“Based on the data we have seen so far, EPA and the Agency for Toxic Substances and Disease Registry (ATSDR) do not anticipate adverse health effects from exposure to the metals detected in the river water samples from skin contact or incidental (unintentional) ingestion. Similarly, the risk of adverse effects to livestock that may have been exposed to metals detected in river water samples from ingestion or skin contact is low. We continue to evaluate water quality at locations impacted by the release.”

The release occurred when the EPA was working to stabilize the existing adit (a horizontal shaft into a mine which is used for access or drainage). The force of the weight of a pool of waste in the mine overcame the strength of the adit, releasing the water into the environment. The EPA’s scope of work at Gold King Mine also included assessing the ongoing leaks from the mine to determine if the discharge could be diverted to retention ponds at the Red and Bonita sites.

The wastewater had been building up since the adit collapsed in 1995. There are networks and tunnels that allow water to easily flow between the estimated 22,000 mine sites in Colorado. As water flows through the sites it reacts with pyrite and oxygen to form sulfuric acid. When the water is not treated and it contacts (naturally occurring) minerals such as zinc, lead, cadmium, copper and aluminum and breaks down the heavy metals, leaving tailings. The mines involved in this incident were known to have been leaking waste for years. In the 90s, the EPA agreed to postpone adding the site to the Superfund National Priorities List (NPL), so long as progress was made to improve the water quality of the Animas River. Water quality improved until about 2005 at which point it was re-assessed. Again in 2008, the EPA postponed efforts to include this area on the NPL. From the available information, it’s unclear if this area and the waste pool would have been treated if the site was on the NPL.

In response, the “EPA is working closely with first responders and local and state officials to ensure the safety of citizens to water contaminated by the spill. ” Additionally, retention ponds have been built below the mine site to treat the water and continued sampling is taking place to monitor the water.

So how do we prevent this from happening again? Mitigation efforts to prevent the release were unsuccessful. This may have been because the amount of water contained in the mine was underestimated. Alternatively, if the amount of water in the mine was anticipated (and the risk more obvious) perhaps the excavation work could have been planned differently to mitigate the collapse of the tunnel. As a local resident, I’m especially curious to learn more facts about the specific incident (how and why it occurred) and how we are going to prevent this from recurring.

The EPA has additional information available (photos, sampling data, historic mine information) for reference: http://www2.epa.gov/goldkingmine

Root Cause Analysis - Incident Investigation

Spider in air monitoring equipment causes erroneously high readings

September 4, 2015 Kim Smiley

By Kim Smiley

Smoke drifting north from wildfires in Washington state has raised concerns about air quality in Calgary, but staff decided to check an air monitoring station after it reported an alarming rating of 28 on a 1-10 scale. What they found was a bug, or rather a spider, in the system that was causing erroneously high readings.

The air monitoring station measures the amount of particulate matter in air by shining a beam of light through a sample of air. The less light that makes it through the sample, the higher the number of particulates in the sample and the worse the quality of air. You can see the problem that would arise if the beam of light was blocked by a spider.

This example is a great reminder not to rely solely on instrument readings. Instruments are obviously useful tools, but the output should always be run through the common sense check. Does it make sense that the air quality would be so far off the scale? If there is any question about the accuracy of readings, the instrument should probably be checked because the unexpected sometimes happens.

In this case, inaccurate readings of 10+ were reported by both Environment Canada and Alberta Environment before the issue was discovered and the air quality rating was adjusted down to a 4. Ideally, the inaccurate readings would have been identified prior to posting potentially alarming information on public websites. The timing of the spider’s visit was unfortunate because it coincided with smoky conditions that made the problem more difficult to identify, but extremely high readings should be verified before making them public if at all possible.

Adding an additional verification step when there are very high readings prior to publicly posting the information could be a potential solution to reduce the risk of a similar problem recurring. A second air monitoring station could be added to create a built-in double check because an error would be more obvious if the monitoring stations didn’t have similar readings.

Depending on how often insects and spiders crawl into the air monitoring equipment, the equipment itself could be modified to reduce the risk of a similar problem recurring in the future.

To view a Cause Map, a visual root cause analysis, of this issue, click on “Download PDF” above.

Root Cause Analysis - Incident Investigation

Power grid near Google datacenter struck by lightning 4 times

August 26, 2015 Kim Smiley

By Kim Smiley

A small amount of data was permanently lost at a Google datacenter after lightning struck the nearby power grid four times on August 13, 2015. About five percent of the disks in Google’s Europe-west1-b cloud zone datacenter were impacted by the lightning strikes, but nearly all of the data was eventually recovered with less than 0.000001% of the stored data not able to be recovered.

A Cause Map, or visual root cause analysis, can be built to analyze this issue. The first step in the Cause Mapping process is to fill in an Outline with the basic background information such as the date, time and specific equipment involved. The bottom of the Outline has a spot to list the impacted goals to help define the scope of an issue. The impacted goals are then used to begin building the Cause Map. The impacted goals are listed in red boxes on the Cause Map and the impacts are the first cause boxes on the Cause Map. Why questions are then asked to add to the Cause Map and visually lay out the cause-and-effect relationships.

For this example, the customer service goal was impacted because some data was permanently lost. Why did this happen? Data was lost because datacenter equipment failed, this particular data was stored on less stable system and wasn’t duplicated in another location. Google has stated that the lost data was newly written data that was located on storage systems which were more susceptible to power failures. The datacenter equipment failed because the nearby power grid was struck by lightning four times and was damaged. Additionally, the automatic auxiliary power systems and backup battery were not able to prevent data loss after the lightning damage.

When more than one cause was required to produce an effect, all the causes are listed vertically and separated by an “and”. You can click on “Download PDF” above to see a high level Cause Map of this issue that shows how an “and” can be used to build a Cause Map. A more detailed Cause Map could be built that could include all the technical details of exactly why the datacenter equipment failed. This would be useful to the engineers developing detailed solutions.

The final step in the Cause Mapping process is to develop solutions to reduce the risk of a problem recurring in the future. For this example, Google has stated that they are upgrading the datacenter equipment so that it is more robust in the event of a similar event in the future. Google also stated that customers should backup essential data so that it is stored in another physical location to improve reliability.

Few of us probably design datacenter storage systems, but this incident is a good reminder of the importance of having a backup. If data is essential to you or your business, make sure there is a backup that is stored in a physically separate location from the original. Similar to the “unsinkable” Titanic, it is always a good idea to include enough life boats or backups in a design just in case something you didn’t expect goes wrong. Sometimes lightning strikes four times so it’s best to be prepared just in case.

Root Cause Analysis - Incident Investigation

Explosions raise concern over hazardous material storage

August 20, 2015 ThinkReliability Staff

By ThinkReliability Staff

On August 12, a fire began at a storage warehouse in Tianjin, China. More than a thousand firefighters were sent in to fight the fire. About an hour after the firefighters went in, two huge explosions registered on the earthquake measurement scale (2.3 and 2.9, respectively). Follow-on explosions continued and at least 114 firefighters, workers and area residents have been reported dead so far, with 57 still missing (at this point, most are presumed dead).

Little is known for sure about what caused the initial fire and continuing explosions. What is known is that the fire, explosions and release of hazardous chemicals that were stored on site have caused significant impacts to the surrounding population and rescuers. These impacts can be used to develop cause-and-effect relationships to determine the causes that contributed to an event. It’s particularly important in an issue like this – where so many were adversely affected – to find effective solutions to reduce the risk of a similar incident recurring in the future.

Even with so much information unavailable, an initial root cause analysis can identify many issues that led to an adverse event. In this case, the cause of the initial fire is still unknown, but the site was licensed to handle calcium carbide, which releases flammable gases when exposed to water. If the chemical was present on site, the fire would have continued to spread when firefighters attempted to fight it using water. Contract firefighters, who are described as being young and inexperienced, have said that they weren’t adequately trained for the hazards they faced. Once the fire started, it likely ignited explosive chemicals, including the 800 tons of ammonium nitrate and 500 tons of potassium nitrate stored on site.

Damage to the site released those and other hazardous chemicals. More than 700 tons of sodium cyanide were reported to be stored at the site, though it was only permitted 10 tons at a time. Sodium cyanide is a particular problem for human safety. Says David Leggett, a chemical risk consultant, “Sodium cyanide is a very toxic chemical. It would take about a quarter of teaspoon to kill you. Another problem with sodium cyanide is that it can change into prussic acid, which is even more deadly.”

But cleaning up the mess is necessary, especially because there are residents living within 2,000 ft. of the site, despite regulations that hazardous sites are a minimum of 3,200 ft. away from residential areas. Developers who built an apartment building within the exclusion zone say they were told the site stored only common goods. Rain could make the situation worse, both by spreading the chemicals and because of the potential that the released chemicals will react with water.

The military has taken over the response and cleanup. Major General Shi Luze, chief of the general staff of the military region, said, “After on-site inspection, we have found several hundred tons of cyanide material at two locations. If the blasts have ripped the barrels open, we neutralize it with hydrogen peroxide or other even better methods. If a large quantity is already mixed with other debris, which may be dangerous, we have built 1-meter-high walls around it to contain the material — in case of chemical reactions if it rains. If we find barrels that remain intact, we collect them and have police transport them to the owners.”

In addition to sending in a team of hazardous materials experts to neutralize and/or contain the chemicals and limiting the public from the area in hopes to limit further impact to public safety, the state media had said they were trying to prevent rain from falling, presumably using the same strategies developed to ensure clear skies for the 2008 Summer Olympics. Whether it worked or not hasn’t been said, but it did rain on August 18, nearly a week after the blast, leaving white foam that residents have said creates a burning or itchy sensation with contact.

View an initial Cause Map of the incident by clicking on “Download PDF” above.

Root Cause Analysis - Incident Investigation

Legionnaires’ Disease Outbreak Blamed on Contaminated Cooling Towers

August 12, 2015 ThinkReliability Staff

By ThinkReliability Staff

An outbreak of Legionnaires’ disease has affected at least 115 and killed 12 in the South Bronx area of New York City. While Legionnaires’, a respiratory disease caused by breathing in vaporized Legionella bacteria, has struck the New York City area before, the magnitude of the current outbreak is catching the area by surprise. (Because the vaporization is required, drinking water is safe, as is home air conditioning.) It’s also galvanizing a call for actions to better regulate the causes of the outbreak.

It’s important when dealing with an outbreak that affects public health to fully analyze an issue to determine all the causes that contributed to the problem. In the case of the current Legionnaires’ outbreak, our analysis will be performed in the form of a Cause Map, or visual root cause analysis. We begin by capturing the basic information (what, when and where) about the issue in a problem outline. Because the issue unfolded over months, we will reference the timeline (to view the analysis including the timeline, click on “Download PDF”) to describe when the incident occurred. Some important differences to note – people with underlying medical conditions and smokers are at a higher risk from Legionnaires’, and Legionella bacteria are resistant to chlorine. Infection results from breathing in contaminated mist, which has been determined to have come from South Bronx area cooling towers (which is part of the air conditioning and heating systems of some large buildings).

Next we capture the impact to the goals. The safety goal is impacted due to the 12 deaths, and 115 who have been infected. The customer service goal is impacted by the outbreak of Legionnaires’. The environmental and property goals are impacted because at least eleven cooling towers in the area have been found to be contaminated with Legionella. The issue is resulting in increased regulation, an impact to the regulatory goal, and testing and disinfection, which is being performed by at least 350 workers and is an impact to the labor goal.

The analysis begins by asking “why” questions from one of the impacted goals. In this case, the deaths resulted from an outbreak of Legionnaires’ disease. The outbreak results from exposure to mist from one of the contaminated cooling towers. The design of some cooling towers allows exposure to the mist produced. It is common for water sources to contain Legionella (which again, is resistant to chlorine) but certain conditions allow the bacteria to “take root”: the damp warm environment found in cooling towers and insufficient cleaning/ disinfection. The cost of cleaning is believed to be an issue – studies have found that, like this outbreak, impoverished areas are more prone to these types of outbreaks. Additionally, there are insufficient regulations regarding cooling towers. The city does not regularly inspect cooling towers. According to the mayor and the city’s deputy commissioner for disease control, there just hasn’t been enough evidence to indicate that cooling towers are a potential source of Legionnaires’ outbreaks.

Evidence would indicate otherwise, however. A study that researched risk factors for Legionnaires’ in New York City from 2002-2011 specifically indicated that proximity to cooling towers was an environmental risk. A 2010 hearing on indoor air quality discussed Legionella after a failed resolution in 2000 to reduce outbreaks at area hospitals. New York City is no stranger to Legionnaires’; the first outbreak occurred in 1977, just after Legionnaires’ was identified. There have been two previous outbreaks of Legionnaires’ this year. Had there been a look at other outbreaks, such as the 2012 outbreak in Quebec City, cooling towers would have been identified as a definite risk factor.

For now, though the outbreak appears to be waning (no new cases have been reported since August 3), the city is playing catch-up. Though they are requiring all cooling towers to be disinfected by August 20 and plan increase inspections, right now there isn’t even a list of all the cooling towers in the city. Echoing the frustrations of many, Bill Pearson, member of the committee that wrote standards to address the risk of legionella in cooling towers, says “Hindsight is 20-20, but it’s not a new disease. And it’s not like we haven’t known about the risk of cooling towers, and it’s not like people in New York haven’t died of Legionnaires’ before.”

Ruben Diaz Jr., Bronx borough president, brings up a good point for the cities that may have Legionella risks from cooling towers, “Why, instead of doing a good job responding, don’t we do a good job proactively inspecting?” Let’s hope this outbreak will be a call for others to learn from these tragic deaths, and take a proactive approach to protecting their citizens from Legionnaire’s disease.