All posts by Holly Maher

I deliver workshops for analyzing, documenting, communicating and solving problems effectively and provide consulting services for incident investigation.

Root Cause Analysis - Incident Investigation

Make safeguards an automatic step in the process

June 10, 2015 Holly Maher

By Holly Maher

On the morning of May 13, 2015, a parent was following his normal morning routine on his way to work. He dropped off his older daughter at school and then proceeded to the North Quincy MBTA (Massachusetts Bay Transportation Authority) station where he boarded a commuter train headed to work. When he arrived, approximately 35 minutes later, he realized that he had forgotten to drop off his one-year-old daughter at her day care and had left her in his SUV in the North Quincy station parking lot. The frantic father called 911 as he boarded a train returning to North Quincy. Thankfully, the police and emergency responders were able to find and remove the infant from the vehicle. The child showed no signs of medical distress as a result of being in the parked car for over 35 minutes.

Had this incident resulted in an actual injury or fatality, I am not sure I would have had the heart to write about it. However, because the impact was only a potential injury or fatality, I think there is great value in understanding the details of what happened and specifically how can we learn from this incident. Unfortunately, this is not an isolated incident. According to kidsandcars.org, an average of 38 children die in hot cars annually. About half of those children were accidentally left in the vehicle by a parent, grandparent or caretaker. While some people want to talk about these incidents using the terms “negligence” or “irresponsibility”, in the cases identified as accidental it is clear the parents were not trying to forget their children. They often describe going into “autopilot” mode and just forgetting. How many of us can identify with that statement?

On the morning this incident happened, the parent was following his typical routine. After dropping off his older child at school, he went into “autopilot” and went directly to the North Quincy MBTA station, parked and left the vehicle to board the train. His one-year-old daughter was not visible to him at that point because she was in the back seat of the vehicle in a rear facing car seat, as required by law. Airbags were originally introduced in the 1970s but became more commercially available in the early 1990s. In 1998, all vehicles were required to have airbags in both the driver and passenger positions. This safety improvement, which has surely reduced deaths related to vehicle accidents, had the unintended consequence of putting children in car seats in a less visible position to the parents. The number of hot car deaths has significantly increased since the early 1990s.

On the morning of the incident the ambient conditions were relatively mild, about 59 degrees Fahrenheit. However, the temperature in a vehicle can quickly exceed the ambient conditions due to what is called the greenhouse effect. Even with the windows down, the temperature in a vehicle can rise quickly. 80% of that temperature rise occurs within the first 10 minutes.

When the parent arrived at his destination, approximately 35 minutes later, he realized he had forgotten the infant and reboarded a train to return to the North Quincy station. Thankfully, the parent also called 911 which expedited the rescue of the infant. The time in the vehicle would obviously have been longer had he not called 911.

One other interesting detail about this incident is that the parent reported that he normally had a “safeguard” procedure that he followed to make sure this didn’t happen, but he didn’t follow it on this particular day. It is unknown what the safeguard was or why it wasn’t followed. This certainly makes an interesting point: we don’t follow safeguards when we know something is going to happen, we follow safeguards in case something happens. As I told my daughter (who didn’t want to wear her seatbelt on the way from school to home because it “wasn’t that far”), you wear your seat belt not because you know you are going to get into an accident, you wear it in case you get into an accident.

The solutions that have been identified for this incident have been taken directly from kidsandcars.org. They promote and encourage a consistent process to manage this risk not when you know you are going to forget, but in case you forget. Consider placing something you need (phone, shoe, briefcase, purse) in the rear floor board so that you are required to open the rear door of the vehicle. Always open the rear door when leaving your vehicle; this is called the “Look before you Lock” campaign. Consider keeping a stuffed animal in the car seat; when the car seat is occupied, place the stuffed animal in the front seat as a visual cue/reminder that the child is in the car. Consider implementing a process where the day care or caretaker calls if your child does not show up when expected. This will minimize the amount of time the child might be left in the car.

For more information about this topic, visit kidsandcars.org.

Root Cause Analysis - Incident Investigation

Explosion Causes Fatality During Hot Work at Fish Processing Plant

September 4, 2014 Holly Maher

By Holly Maher

On July 28, 2014, one contract worker was killed and another seriously injured when an explosion occurred within a fish oil storage tank, blowing the lid off the 30 foot high vessel. Contractors were on top of the tank, performing required welding on the tank. The storage tank contained approximately 8 inches of “stickwater” or a slurry of water and fish matter thought to be non-hazardous.

Although the official investigation of this incident continues with participation from both OSHA (Occupational Safety and Health Administration) and the CSB (Chemical Safety Board), we can use a Cause Map to visually lay out the cause and effect relationships known at this point. As information becomes available, additional causes can easily be added to the Cause Map.

The first step in the Cause Mapping process is to identify the problem by filling out the Outline. We clarify the date, time, location, and sometimes “what was different about this incident”, which at this point is unknown. The explosion occurred at ~9:30 on the morning of July 28th, 2014 at a fish processing plant in Moss Point, Mississippi. The task being performed when the incident happened was welding on the storage tank. At the bottom of the outline we identify the impact to the goals for the organization, because although you may get many answers to “what is the problem”, the impact to the goals will provide a common starting point for the investigation. In this case, the primary goal impacted was the fatality and serious injury related to the explosion. The tank damage and downtime in the facility could also be captured, however we have focused our discussion here on the safety goal impact.

Once we have identified the goals impacted, we can start the analysis by simply asking some “why” questions. Why was there one contractor fatality and one serious injury? Because there was an explosion. Why was there an explosion? Because there was an ignition source. Why was there an ignition source? Because contractors were welding on the tank. “Why” is a great way to get any investigation started, but we also want to expand the analysis to ensure all the causes are identified (the system of causes, if you will). In this case, the explosion is caused not just by the ignition source, but also the presence of fuel and the presence of oxygen (think fire triangle).

The ignition source was caused by the welding on the tank, which was being done for repairs and because the workers were unaware of the combustible atmosphere in the tank. The workers were unaware of the combustible atmosphere because there was no atmospheric testing done on the vapor space in the tank because the stickwater was considered to be non-hazardous. Unlike the oil and gas industry, where the potential for flammable or combustible atmospheres is well known and managed through atmospheric testing, the potential is less well known in industries, such as fish processing, where organic microbiological fluids can release flammable gases, creating a potential risk when doing maintenance work that is spark or heat producing (hot work). The fuel source for the explosion was methane and hydrogen sulfide being released from the stickwater. A sample of the material was sent to the lab after the explosion and the presence of these off-gases was identified. The flammable gases were present because there was 8 inches of stickwater present in the tank.

The Cause Mapping process allows us to identify all the causes related to an incident with the goal of identifying the best solutions to mitigate potential future risk. Even with this initial analysis, we can start to identify potential solutions to mitigate the risk of this incident occurring again. Clearly, the potential hazards from flammable atmosphere is not well known in industries with mixtures of water and organic material (e.g. fish processing, pulp processing, potato processing), so lessons learned from this incident, along with others investigated by the CSB, would be worth sharing across the industry. In addition, requiring atmospheric testing for hot work would mitigate the potential for explosions during these types of maintenance activities. Another option would be to drain and clean the tank prior to welding activities. These solution could have significant, global impact across all types of hot work activities.

Root Cause Analysis - Incident Investigation

Impact of Gasoline Spending on US Household Budgets

June 12, 2014 Holly Maher

By Holly Maher

Having worked in an oil refinery for the majority of my career, the question “why is gasoline so expensive?” has been posed to me on more than a few occasions. It is normally asked with a great deal of frustration and sometimes with a bit of anger directed at the oil companies (and those who work for them). So, with summer driving season officially kicked off, it seems like an appropriate time to tackle this issue.

If we ask the question “What is the problem” we can expect to get different answers: crude oil price is too high, oil companies are making too much profit, people are driving too many SUVs, etc.. All of these answers give perspectives on what different people view as the problem, which is subjective. So in order to start the analysis, we have to identify how this issue is impacting our goals. In terms of the impact to the average American family, the annual spending on gasoline is impacting the household budget. In 2011, the average spending on gasoline was $2,655 or roughly 4% of the average household gross income.

Once we have identified the impact to the goal we can begin the analysis. We start by asking “why” questions and documenting the answers to visually lay out all the causes that contributed to this impact. The cause-and-effect relationships lay out from left to right. The average annual spending on gasoline is caused by both the price of a gallon of gasoline, which in 2011 was $3.52/gal, as well as the annual consumption of gasoline (average household consumption was 754 gallons in 2011). Although the national discussion tends to focus on the price at the pump, the price alone does not create the impact to the household budget (you don’t see too many articles on the price of a gallon of milk, which, by the way, in 2011 was $3.57/gallon).

The price of a gallon of gasoline is set by 4 primary causes: crude oil price (~68% of the price), state/local/federal taxes (~13% of the price), transportation and marketing (~11% of the price) and the cost of refining the crude oil into useable products (~8% of the price). The price of crude oil in 2011 was $94.87/bbl (barrel). This is compared to $27.39/bbl in 2000 and $23.19/bbl in 1990. This price is set by normal supply and demand economics, both internationally and domestically. The global demand for crude oil has dramatically shifted in recent years as the countries in eastern Asia have moved into their “industrial revolution”. The supply of crude oil globally is set not only by total oil well capacity, but also by transportation availability, OPEC targets, as well as political sanctions on oil-producing countries.

In addition to normal supply and demand economics, crude oil is a traded commodity on the stock market and is susceptible to price fluctuation based on fear and speculation. Prior to 2000, the energy market and trading of energy futures was regulated because of the significant impact it could have on the economy. In 2000, the energy sector was deregulated as part of the Commodity Futures Modernization Act of 2000.

The average annual household consumption of gasoline in 2011 was 754 gallons. This is caused by the annual miles driven per car (15,000 miles), the number of cars per household (1.95 cars), and the fuel efficiency of the cars. The average mileage per car is caused by commute mileage, whether household members carpool, whether household members utilize public transportation and recreational miles driven (outside of work). The fuel efficiency of cars is determined by the types of cars driven, the fuel efficiency technology available and the vehicle fuel efficiency standards required by law. In 2011, 50% of the household vehicles purchased were classified as light trucks. New fuel efficiency standards were introduced for vehicles in 2011 requiring passenger cars to meet 30.2 miles per gallon (mpg) and light-trucks to meet 24.2 mpg. This was an increase of 2 mpg for each type of vehicle.

Once the analysis has been broken down into its causes, solutions can be identified to mitigate the impact to the goal. Even with this initial, basic analysis, solutions start to be become visible. Household members could car pool more (with friends, co-workers, or their spouse). Household members could take public transportation, if available, and communities could work to make public transportation more available to residents. Households could purchase more fuel efficient vehicles. The government could continue to increase fuel efficiency requirements. The government could pass a law re-regulating the energy sector.

As with any incident or problem with significant impact to the goal(s), the analysis always reveals more than one single cause. Being able to see multiple causes gives us the opportunity to find more than one potential solution.

To view the Outline, Cause Map, and solutions please click “Download PDF” above.

Root Cause Analysis - Incident Investigation

Why Can’t the Missing Malaysia Airliner be Found?

April 25, 2014 Holly Maher

By Holly Maher

On March 8, 2014 Malaysia Airline flight MH370 took off from Kuala Lumpur heading for Beijing, China. The aircraft had 239 passengers and crew aboard. Less than 1 hour into the flight, communication and radar contact was lost with the aircraft. Forty-nine days later, the location and fate of the aircraft is still unknown despite a massive international effort to locate the missing airliner. The search effort has dominated the news for the last month and the question is still out there: how, with today’s technology, can an entire aircraft go missing?

Since we may never know what happened to flight MH370, this analysis is intended to understand why we can’t find it and identify the causes required to produce this effect. This will allow us to identify many possible solutions for preventing it from happening again. We start by asking “why” questions and documenting the answers to visually lay out all the causes that contributed to this incident. The cause-and-effect relationships lay out from left to right.

In this example, the Customer Service Goal is impacted because we are missing 239 passengers and crew. This is caused by the fact that we can’t locate Malaysia Airline MH370. The inability to locate the airline is a result of a number of causes over the 49 day period. One reason is that 3 days were initially spent looking in the wrong location, along the original flight path from Kuala Lumpur to Beijing, in the Gulf of Thailand and the South China Sea. The reason 3 days were mistakenly spent looking in this location is that the airline had left the original flight path and officials were unaware of that fact. Why the aircraft left the original flight path is still unknown, but we can look at some of the causes that allowed the flight to leave the original flight path undetected.

One of the reasons the aircraft was able to leave the original flight path undetected was that air traffic control was unable to track the airplane with radar. The transponder onboard the aircraft, which allows the ground control to track the aircraft using airspeed and altitude, was turned off less than one hour into the flight. We don’t know the reason the transponder was turned off; however, the fact that it is designed to be turned off manually is a cause of the transponder being turned off. It is designed to be manually turned off to reduce risk in the event of failure or fire, and to reduce radio traffic when the airplane is on the ground. After 9/11, when 3 out of the 4 hijacked airplanes had transponders that had been turned off, the airline industry debated the manual on/off design of the transponder, but aviation experts strongly supported the need for the pilots to be able to turn off the transponders, as needed, for the safety of the flight.

Another reason the aircraft left the original flight path undetected was because the flight crew outside the cockpit did not communicate distress or change of route. This is because all communications from the airplane come from/through the cockpit. The aircraft is not currently equipped to allow for communication, specifically distress communications, from outside the cockpit.

Days into the investigation, radar data was identified which showed the change of course of the aircraft. This changed the area of the search away from the original flight path. However, this radar detection was not identified in real time, as the plane was moving away from the original flight path. This is also a cause of the aircraft being able to leave the flight path undetected.

Once the search area moved west, the size of the potential search area was incredibly large, another cause of being unable to locate the aircraft. At its largest, the search area was 2.96 million square miles. This was based on an analysis of how far the flight could have gotten with the amount of fuel on board. Further analysis of satellite data, or “handshakes” with the computer framework on board the aircraft, continued to refine the search area.

Many people have asked why no one on the flight made cell phone calls indicating distress (if this was an act of terrorism). The reason no cell phone calls were made was because cell phones do not work over 2000 ft. That is because there is no direct line to a cellular tower.

Another cause of being unable to locate MH370 is being unable to locate the black box. The black box is made of aluminum and is very heavy, designed to withstand significant forces in the event of a crash. This causes the black box to sink, instead of float, making it difficult to locate. The depth of the ocean in which the search is occurring ranges from 4,000-23,000 ft, adding to the difficulty of finding the black box. Acoustic pings were last detected from the black box on April 8, 2014, 32 days into the search. This is because the battery life on the black box is ~30 days. This had been the battery design life criteria prior to the Air France Flight 447 crash in 2009. It took over 2 years to locate the black box and wreckage from flight 447, therefore the design criteria for the black box battery life was changed from 30 days to 90 days. This would allow search crews more time to locate the black box. Malaysia Airlines Flight MH370 still had a black box with a battery life of 30 days.

Once the analysis has broken down incident into its causes, solutions can be identified to mitigate the risk a similar incident in the future.

To view the Outline and Cause Map, please click “Download PDF” above. Or click here to read more.

Root Cause Analysis - Incident Investigation

1 Dead and 27 Hospitalized from Carbon Monoxide at Restaurant

February 27, 2014 Holly Maher

By Holly Maher

On Saturday evening, February 22, 2014, 1 person died and 27 others were hospitalized due to carbon monoxide poisoning. The individuals were exposed to high levels of carbon monoxide that had built up in the basement of a restaurant. The restaurant was evacuated and subsequently closed until the location could be deemed safe and the water heater, located in the basement, was inspected and cleared for safe operation.

So what caused the fatality and 27 hospitalizations? We start by asking “why” questions and documenting the answers to visually lay out all the causes that contributed to the incident. The cause and effect relationships lay out from left to right.

In this example, the 1 fatality and 27 hospitalizations occurred because of an exposure to high levels of carbon monoxide gas, which is poisonous. The exposure to high levels of carbon monoxide gas was caused not only by the high levels of carbon monoxide gas being present, but also because the restaurant employees and emergency responders were unaware of the high levels of carbon monoxide gas.

Let’s first ask why there were high levels of carbon monoxide present. This was due to carbon monoxide gas being released into the basement of the restaurant. The carbon monoxide gas was released into the basement because there was carbon monoxide in the water heater flue gas and because the flue gas pipe, intended to direct the flue gas to the outside atmosphere, was damaged. The carbon monoxide was present in the flue gas because of incomplete combustion in the water heater. At this point in the investigation, we don’t have any further information. This can be indicated as a follow-up point on the cause map using a question mark. We have also identified the reason for the flue gas pipe damage as a question mark, as we do not currently have the exact failure mechanism (physical damage, corrosion, etc.) for the flue gas pipe. What we can identify as one of the causes of the flue gas pipe failure is an ineffective inspection process. How do we know the inspection process was ineffective? Because we didn’t catch the failure before it happened, which is the whole point of requiring periodic inspections. This water heater had passed its annual inspection in March of 2013 and was due again in March 2014.

If we now ask the question, why were the employees unaware of the high levels of carbon monoxide present, we can identify that not only is carbon monoxide colorless and odorless, but also there was no carbon monoxide detector present in the restaurant. There was no carbon monoxide detector installed because it is not legally required by state or local codes. The regulations only require carbon monoxide detectors to be installed in residences or businesses where people sleep, i.e. hotels.

Once all the causes of the fatality and hospitalizations have been identified, possible solutions to prevent the incident from happening again can be brainstormed. Although we still have open questions in this investigation, we can already see some possible ways to mitigate this risk going forward. One possible solution would be to legally require carbon monoxide detectors in restaurants. This would have alerted both employees and responders of the hazard present. Another possible solution would be to require more frequent inspections of this type of combustion equipment.

To view the Outline and Cause Map, please click “Download PDF” above.

Root Cause Analysis - Incident Investigation

16-Day Government Shutdown Affects Economy

November 1, 2013 Holly Maher

By Holly Maher

On October 1, 2013 at 12:01 AM, the beginning of the 2014 fiscal year, the federal government shut down all non-essential operations when Congress could not pass a continuing resolution to allow spending at current levels. The government shutdown lasted 16 days and, in addition to other impacts, closed the National Parks system (see our blog about the park closures), furloughed 800,000 federal employees, had the potential to impact payment of veterans’ benefits and negatively impacted the economy, both directly and indirectly.

So what caused the government shutdown? If you watched any TV during that 16 day period, you could certainly hear any number of experts (on both sides) explaining who was to blame. As the Cause Mapping methodology is intended to do, this analysis of the government shutdown is not trying to identify the one person, the one group or the one reason to blame for the shutdown. Instead, we will identify all the causes required to produce this effect. This will allow us to identify many possible solutions for preventing it from happening again. We start by asking “why” questions and documenting the answers to visually lay out all the causes that contributed to the shutdown. The cause and effect relationships lay out from left to right.

In this example, the government shutdown occurred because a vote on a continuing resolution bill could not be passed by Congress because there was a line item added to the continuing resolution, defunding the Affordable Care Act (ACA) that could not be agreed upon. A continuing resolution was required because the Constitution gives the power to spend money to Congress, and since they had not passed a Budget for fiscal year 2014, a continuing resolution was constitutionally required to continue operating the government after October 1. Defunding the ACA was added to the continuing resolution bill because the ACA was about to go into effect and because it can be added on a line item basis. Congress was unable to compromise to reach an agreement to pass the continuing resolution.

So why was Congress unable to reach an agreement? If the incentive to compromise was greater than the incentive to not compromise, they would have compromised. So why is the incentive to compromise ineffective? One of the reasons is because Congress’s pay is not affected when the government shuts down. Another reason is because there is significant incentive to maintain a position aligned with the party (either left or right). The desire to get re-elected (which is unlimited within Congress), the need for support in the primaries to get re-elected (based on the current primary system), and the need for campaign financing are all causes that support the incentive to maintain alignment with the party versus compromise.

Once all the causes of the government shutdown have been identified, possible solutions to prevent the shutdown from happening again can be brainstormed. One possible solution would be to legally require a continuing resolution to be a “clean” bill, with no additional line items. This would make it more likely in the future, when there are debates or discussions over current, hot button items, such as the ACA, that the result would not be a failure to pass the continuing resolution and therefore cause a government shutdown. Another possible solution would be to stop pay for Congress during the government shutdown. Other more global, systemic solutions might be to implement term limits in Congress or provide government campaign financing to reduce the dependency on party financial support.

To view the Outline and Cause Map, please click “Download PDF” above.