Category Archives: Root Cause Analysis – Incident Investigation

On February 9, 2014, a Royal Air Force Voyager was transporting 189 passengers and a crew of 9 towards Afghanistan when the plane suddenly entered a steep dive. Many passengers were unrestrained and were injured by striking the ceiling or other objects. Other passengers were injured by flying objects or spills of hot liquid. More than 30 passengers and crew members reported injuries, all considered minor. The Military Aviation Authority’s final report contains details of the impacts from the dive, the causes of the dive, and recommendations that would reduce the possibility of a similar issue in the future.

These impacts, the cause-and-effect relationships that led to them, and the recommended solutions can be captured within a Cause Map. The Cause Map process begins with filling in a Problem Outline, which captures the what, when and where of an incident, followed by the impacts to the goals. The problem covered by the report is the aircraft dive and resulting injuries which occurred on February 9, 2014 at about 1549 (3:49 PM) on an Airbus A330-243 Voyager tanker air transport flight. Things that were different, unusual or unique at the time of the incident are also captured. In this case, the plane had experienced prior turbulence, and the co-pilot was not in his seat at the time of the dive.

The next step is to capture the impacts to the goals on the Outline. In this case, the safety goal is impacted because of a significant potential for fatalities, as well as the more than 30 actual injuries. Customer service is impacted due to the steep dive of the plane, and the regulatory goal is impacted due to the court-martial of the pilot, as well as 10 lawsuits against the Ministry of Defense. Production was impacted because the plane was grounded for 12 days, the property goal is impacted because of the potential for the loss of the whole plane, and the labor goal is impacted by the investigation.

Beginning with an impact to the goal, all the cause-and-effect relationships that led to that goal are captured on the Cause Map. In this case, the potential for fatalities resulted from the potential loss of the plane. According to Air Marshal Richard Garwood, previous director general of the UK’s Military Aviation Authority (MAA), “On this occasion, the A330 automatic self-protection systems likely prevented a disaster of significant scale. The loss of the aircraft was not an unrealistic possibility.” The potential for the loss of the plane resulted from the steep dive. The reason the plane was NOT lost (and this becomes a significant near miss) is the plane was recovered to level flight by the flight envelop protection system, which functioned as designed. (Although this is a positive, not a negative, it’s a cause all the same and should be included in the Cause Map.)

The steep dive resulted from the controller being forced forward without being counteracted. These are two separate causes that resulted in the effect, and are listed vertically and joined with an “AND” on the Cause Map. More detail should be provided about both causes. The command could not be counteracted because the co-pilot was not on the flight deck. He had been taking a break for several minutes before the incident. The investigation found that the controller was forced forward by a camera that was pushed against the controller. The camera had been placed between the seat and the controller, and then the seat was pushed towards (as is normal to occur during flight).

The investigation found that, despite concerns for about a year prior to this incident, loose personal articles were not prohibited on the flight deck. While there was a requirement to stow loose articles, it was not referenced in the operational manual and instead became one of thousands of paragraphs provided as background, resulting in a lack of awareness of controller interference from loose articles. The pilot was found to be using the camera while on the flight deck, likely due to boredom on the highly automated plane. (Analysis of the camera and flight recordings provided evidence.) The pilot was court-martialed for “negligently performing a duty, perjury and making a false record”, presumably at least partially due to the use of a personal camera while solo on the flight deck.

The report provided many recommendations as a result of the investigation, including increasing seat belt use by passengers and crew during rest periods, which would have reduced some of the injuries caused by unrestrained personnel striking the ceiling of the aircraft. Recommendations also included ensuring manufacturer’s safety advice is included in operational documents, promoting awareness of the danger of loose articles, and maximizing use of storage for loose articles, all of which aim to reduce the risk of loose articles contacting control equipment. An additional recommendation is to manage low in-flight pilot workload in an attempt to combat the boredom that can be experienced on long flights.

To view the Problem Outline, Cause Map, and recommendations, please click “Download PDF” above. Or click here to read the Military Aviation Authority’s report.

Root Cause Analysis - Incident Investigation

Small fire leads to thousands of canceled flights

August 19, 2016 Kim Smiley

By Kim Smiley

Starting August 8, 2016, thousands of travelers were stranded worldwide after widespread cancelations and delays of Delta Air Lines flights. The disruptions continued over several days and the impacts lingered even longer. The flight issues made headlines around the globe and the financial impact to the company was significant.

So what happened? What caused this massive headache to so many travelers? The short answer is a small fire in an airline data center, but a much longer answer is needed to understand what caused this incident. A Cause Map, a visual format for performing a root cause analysis, can be used to analyze this issue. All of the causes that contributed to an issue are visually laid out to intuitively show cause-and-effect relationships in a Cause Map. The Cause Map is built by asking “why” questions and adding the answers. For an effect with more than one cause, all of the causes that contributed to the effect are listed vertically and separated by an “and”. (Click on “Download PDF” to see an intermediate level Cause Map of this incident.)

So why were so many flights canceled and delayed? There was a system-wide computer outage and the airline depends on computer systems for everything from processing check-ins to assigning crews and gates. Bottom line, no flights leave on time without working computer systems. The issues originated at a single data center, but the design of the system led to cascading computer issues that impacted systems worldwide. The airline has not released any specific details about why exactly the issue spread, but this is certainly an area investigators would want to understand in order to create a solution to prevent a similar cascading failure in the future.

In a statement, the company indicated that an electrical component failed, causing a small fire at the data center. (Again, the specifics about what type of component and what caused the failure haven’t been released.) The fire caused a transformer to shut down which resulted in a loss of primary power to the data center. A secondary power system did kick on, but not all servers were connected to backup power. No details have been released about why some servers were not powered by the secondary power supply.

Compounding the frustration for the impacted travelers is the fact that they were unable to get updated flight information. Flight status systems, including airport monitors, continued to show that all flights were on time during the period of the cancelations and delays.

Once a large number of flights are disrupted, it is difficult to return to a normal flight schedule. The rotation schedule for airlines and pilots has to be redone, which can be time-consuming. Many commercial flights operate near capacity so it can be difficult to find seats for all the passengers impacted by canceled and delayed flights. Delta has tried to compensate travelers impacted by this incident by offering refunds and $200 in travel vouchers to people whose flights were canceled or delayed at least three hours, but an incident of this magnitude will naturally impact customer confidence in the company.

This incident is a good reminder of the importance of building robust systems with functional backups; otherwise a small problem can spread and quickly become a big problem.

Root Cause Analysis - Incident Investigation

911 Outage in Baltimore

July 22, 2016 Kim Smiley

By Kim Smiley

Nobody ever wants to find themselves in the position of dialing 911. But imagine how quickly a bad situation could get even worse if nobody answered your call for emergency help. That is exactly what happened on July 16, 2016 to people in Baltimore, Maryland. For about two hours, people dialing 911 in Baltimore got a busy signal.

This incident can be investigated by building a Cause Map, a visual root cause analysis. A Cause Map intuitively lays out the many causes that contributed to an issue to show all the cause-and-effect relationships. By focusing on the multiple causes, rather than a single root cause analysis, the range of solutions considered is naturally widened.

The first step in the Cause Mapping process is to fill in an Outline with the basic background information for the incident. Additionally, the Outline is used to capture how the incident impacts the overall goals. This incident, like most incidents, impacted more than one goal. For example, the safety goal is impacted because of the delay in emergency help and the customer service goal is impacted because people were unable to reach 911 operators.

The bottom line on the Outline is used to note the frequency of similar incidents. This is important because an incident that has occurred 12 times before may warrant a different level of investigation than an isolated incident. For this example, newspapers reported a previous 911 outage in June in the Baltimore area. The outages appear to have been caused by different issues, but do raise questions about the overall stability of the 911 system in Baltimore. Investigators should determine if the multiple outages are related and indicative of bigger issues than just this one incident.

Once the Outline is completed, the Cause Map itself is built by asking “why” questions. So why was there a 911 outage for about 2 hours? Newspapers have reported that the outage occurred because of electrical power failures after both the main and back-up power systems shut down. The power systems shut down because of a malfunctioning air conditioning unit. No details have been released about exactly why the air conditioning units malfunctioned, but additional information could quickly be added to the Cause Map as it becomes known.

The final step in the Cause Mapping process is to develop and implement solutions to reduce the risk of the problem reoccurring. The investigation into this incident is still ongoing and no information about potential long-term solutions has been announced. In the short term, callers were asked to dial 311 or call their closest fire station or police district station if they heard a busy signal or were otherwise unable to get through to 911. It is probably not a bad idea for all of us to have the numbers of our local fire and police stations on hand, just in case.

Root Cause Analysis - Incident Investigation

Train Derails on Track Just Inspected

July 15, 2016 ThinkReliability Staff

By ThinkReliability Staff

A train derailment in the Columbia River Gorge near Mosier, Oregon resulted in a fire that burned for 14 hours. The Federal Railroad Administration (FRA) preliminary investigation says the June 3rd derailment was caused by a broken lag bolt which allowed the track to spread, resulting in the 16-car derailment. Although there is only one other known instance of a broken lag bolt causing a train derailment, the FRA determined that the bolt had been damaged for some time, and had been inspected within days of the incident, raising questions about the effectiveness of these inspections.

Determining all the causes of a complex issue such as a train derailment can be difficult, but doing so will provide the widest selection of possible solutions. A Cause Map, or visual root cause analysis, addresses all aspects of the issue by developing cause-and-effect relationships for all the causes based on the impacts to an organization’s goals. We can create a Cause Map based on the preliminary investigation. Additional causes and evidence can be added to the map as more detail is known.

The first step in the Cause Mapping process is to determine the impacts to the organization’s goals. While there were no injuries in this case, the massive fire resulting from the derailment posed a significant risk to responders and nearby citizens, an impact to the safety goal. The release of 42,000 gallons of oil (although much of it was burned off in the fire) is an impact to the environmental goal. The customer service goal is impacted by the evacuation of at least 50 homes and the regulatory goal is impacted by the potential for penalties, although the National Transportation Safety Board (NTSB) has said it will not investigate the incident. The state of Oregon has requested a halt on oil traffic, which would be an impact to the schedule goal. The property goal is impacted by the damage to the train cars, and the labor/ time goal is impacted by the response and investigation.

The analysis, which is the second step in the Cause Mapping process, begins with one of the impacted goals and develops cause-and-effect relationships by asking ‘Why’ questions. In this case, the safety goal is impacted by the high potential for injuries. This is caused by the massive fire, which burned for 14 hours. There may be more than one cause resulting in an effect, such as a fire, which is caused by heat, fuel, and oxygen. The oxygen in this case is from the atmosphere. The heat source is unknown but could have been a spark caused by the train derailment. The fire was fueled by the 42,000 gallons of crude released due to damage to train cars, which were transporting crude from the Bakken oil fields, caused by the derailment.

The derailment of 16 cars of the train was caused by the broken lag bolt. Any mechanical failure, such as a break, results from the stress on that object exceeding the strength of the object. In this case, the stress was caused by the weight of the 94-car train. The length of a train carrying crude oil is not limited by federal regulations. The strength of the bolts was reduced due to previous damage, which was not identified prior to the failure. While the track strength is evaluated every 18 months by the Gauge Restraint Measurement System (GRMS), it did not identify the damage. It’s unclear the last time it was performed.

Additionally, although the track is visually inspected twice a week by the railroad, it is done by vehicle, which would have made the damage harder to spot. The FRA does not require walking inspections. Nor does the FRA inspect or review the railroad’s inspections very often – there are less than 100 inspectors for the 140,000 miles of track across the country. There are only 3 in Oregon.

As a result of the derailment, the railroad has committed to replacing the existing bolts with heavy-duty ones, performing GRMS four times a year, enhanced hyrail inspections and visual track inspections three times a week, and performing walking inspections on lag curves monthly.

The FRA is still evaluating actions against the railroad and is again calling for the installation of advanced electronic brakes, or positive train control (PTC). It has also recommended PTC after other incidents, such as the deaths of two railroad workers on April 3 (see our previous blog) and the derailment in Philadelphia last year that killed 8 (see our previous blog).

To view a one-page PDF of the Cause Mapping investigation, click on “Download PDF” above. Or, click here to read the FRA’s preliminary investigation.

Root Cause Analysis - Incident Investigation

FAA Proposes Amazon Fine for Hazardous Shipment

July 8, 2016 Kim Smiley

By Kim Smiley

The Federal Aviation Administration (FAA) recently proposed fining Amazon $350,000 for shipping a product that allegedly violated hazardous materials regulations. The package in question was shipped by Amazon from Louisville, Kentucky, to Boulder, Colorado and contained a one-gallon container of corrosive drain cleaner with the colorful name Amazing! LIQUID FIRE. During transit, the package leaked and 9 UPS workers were exposed to the drain cleaner and reported a burning sensation. The workers were treated with a chemical wash and experienced no further issues, but this incident highlights issues with improper shipment of hazardous materials.

A Cause Map, a visual root cause analysis, can be built to analyze this issue by visually laying out the cause-and-effect relationships that contributed to the issue. The first step in the Cause Mapping method is to fill in an Outline. The top part of the Outline lists the basic background information for the issue, such as the date and time. The bottom portion of the Outline has a section to list how the problem impacts the overall goals of the organization. Most problems have more than one impact and this incident is no exception. For example, the safety goal is impacted because workers were exposed to hazardous chemicals and the regulatory goal is impacted because of the FAA investigation and the proposed fine.

The frequency of the issue is listed on the last line of the Outline. Identifying the frequency is important because an issue that has occurred a dozen times may likely warrant a more detailed investigation than an issue that has been reported only once. For this example, Amazon has had at least 24 hazardous materials violations between February 2013 and September 2015 so the concerns about improperly handling hazardous materials goes beyond the issues with this one package.

Once the Outline is completed, the Cause Map is built by starting at one of the impacted goals and asking “why” questions. Starting at the safety goal for this example, the first question would be “why were workers exposed to hazardous chemicals?”. This happened because the workers were handling a package containing hazardous chemicals, a package containing hazardous chemicals leaked, and inadequate precautions were taken to prevent the workers being exposed to the chemicals. When there is more than one cause that contributes to an effect, the causes are listed vertically and separated by an “and”.

To continue building the Cause Map, ask “why” questions for each of the causes already listed. The workers were handling the package because it shipped by air via UPS. Inadequate precautions were taken to prevent exposure to the chemical because workers were unaware that package contained hazardous chemicals. Chemicals leaked because they were not properly packaged. Why questions should continue to be asked until no more information is known or no useful detail can be added to the Cause Map. To view an intermediate level Cause Map of this issue with more information, click on “Download PDF” above.

The final step in the Cause Mapping process is to use the Cause Map to develop and implement solutions to reduce the risk of the problem reoccurring. More information about what exactly led to improperly packaged and labeled hazardous materials being shipped would be needed to develop useful solutions in this example, but hopefully a fine of this size and the negative publicity it generated will help spark efforts to make improvements.

Root Cause Analysis - Incident Investigation

Kansas City Interstate Overpass Closed Due to 20′ Crack

June 3, 2016 ThinkReliability Staff

By ThinkReliability Staff

A bridge engineer watching a crack (previously described as “tight”) under the Grand Boulevard bridge noticed it had extended to 20′ on May 6, 2016. He immediately ordered the bridge closed, requiring the rerouting of the more than 9,000 vehicles that use the bridge every day. Replacing the bridge is estimated to cost $5 million.

Luckily, due to the quick action of the engineer, there were no injuries or fatalities as could have occurred due to either the bridge catastrophically collapsing while in use, or for motorists on the Interstate below being struck by large chunks of concrete falling from the overpass.

The overpass failure can be addressed in a Cause Map, or visual root cause analysis. The process begins by capturing the what, when and where of the incident (a bridge failure May 6 in Kansas City) and the impacts to the goals. Because there was the potential for injuries, the safety goal is impacted. The re-routing of over 9,000 vehicles a day is an impact to the customer service goal. The closing of the bridge’s overpass/ sidewalks is an impact to the production goal, and the cost of replacing the bridge is an impact to the property/ labor goal.

By beginning with an impacted goal and asking ‘Why’ questions, cause-and-effect relationships that lay out the causes of an incident can be developed. In this case, the impacted goals are caused by the significant damage to the bridge, due to a rapidly spreading crack.

The failure of any material or object, including all or part of a bridge, results from the stress on that object from all sources overcoming the strength of the object. In this case the stress on the bridge was greater than the strength of the bridge. Stress on the bridge results from each pass of a vehicle over the life of the bridge. In this case, 9,300 vehicles a day transit the bridge, which has been in service since 1963.

Stress also results from large trucks traveling over the bridge. The engineers suspect this is what happened, possibly due to an apartment construction project near the bridge. Says Brian Kidwell, an assistant engineer for the Missouri Department of Transportation, “My hunch is a very heavy load went over it. It could have been a totally legal load.” A “hunch” by an experienced professional is included in the Cause Map as a potential cause. This is indicated with a “?” and requires more evidence.

Legal loads on bridges are based on the allowable stress for a bridge’s strength. However, the strength of the bridge can change over the years. It is likely that happened in this case. Previous damage has been noted on the bridge, which also required bracing last month to fix a sagging section. However, the bridge was deemed “adequate” in an inspection eight months ago. Any needed repairs may not have occurred – there’s never enough money for needed infrastructure improvements. It’s also possible that water entered the empty cylinders that make up the part of the span of the bridge (this is called a “sonovoid” design) and they could have filled with water and later frozen, causing damage that can’t be easily seen externally.

For now, more information will be required to determine what led to the bridge failure. At that point, bridges of similar design may face additional inspections, or be replaced on the long waiting list for repairs. For Kansas City, some are taking a broader – and bolder – view and are recommending the older section of the Interstate “loop” be removed altogether.

To view the Cause Map of the bridge failure, click on “Download PDF” above. Or, click here to learn.

Root Cause Analysis - Incident Investigation

Airplane Emergency Instructions: How do you make a work process clear?

May 12, 2016 ThinkReliability Staff

By ThinkReliability Staff

What’s wrong with the process above?

This process provides instructions on how to remove the over-wing exit door on an airplane during an emergency. However, imagine performing this process in an actual emergency. During the time you spend opening the door, there will probably be people crowded behind you, frantic to get off the plane. Step 4 indicates that after the door is detached from the plane wall, you should turn around and set the door (which is about 4’ by 2’ and can weigh more than 50 pounds) on the seats behind you. In most cases, this will be impossible. This is why emergency exit doors open towards the outside; in an emergency, a crush against the door will make opening the door IN impossible.

Even if it would be possible to place the door on the seat in the emergency exit row, it would likely reduce the safety of passengers attempting to exit. As discussed, the exit door is fairly large and heavy. It is likely to be displaced while passengers are exiting the airplane and may end up falling on a passenger, or blocking the exit path.

However, when this process was tested in training, it probably worked fine. Why? Because it wasn’t an actual emergency, and there probably weren’t a plane full of passengers that really wanted to get out. This is just another reason that procedures need to be tested in as close to actual situations as possible. At the very least, any scenario under which the process is to be performed should be replicated as nearly as possible.

Now take a look at this procedure:

It’s slightly better, not telling us to put the removed door on the seat behind us, but instead it doesn’t tell us what to do with the door. Keep in mind that the person performing this procedure’s “training” likely consisted of a 30-second conversation with a flight attendant and that in all probability, the first time he or she will perform the task is during an emergency situation. When testing a procedure, it’s also helpful to have someone perform the procedure who is not familiar with it, with instructions to do only what the procedure says. In this case, that person would end up removing the door . . . and then potentially attempting to climb out of the exit with the door in their hands. This is also not a safe or efficient method of emergency escape.
This procedure provides a much better description of what should be done with the door. The picture clearly indicates that the door should be thrown out of the plane, where it is far less likely to block the exit or cause passenger injury.

The first two procedures were presumably clear to the person who created them. But had they been tested by people with a variety of experience levels (particularly important in this case, because people of various experience levels may be required to open the doors in an emergency), the steps that really weren’t so clear may have been brought to light.

Reviewing procedures with a fresh eye (or asking someone to perform the procedure under safe conditions based only upon the written procedure) may help to identify steps that aren’t clear to everyone, even if they were to the writer. This can improve both the safety, and the effectiveness, of any procedure used in your organization.

Root Cause Analysis - Incident Investigation

8 Injured by Arresting Cable Failure on Aircraft Carrier

May 5, 2016 ThinkReliability Staff

By ThinkReliability Staff

An aircraft carrier is a pretty amazing thing. Essentially, it can launch planes from anywhere. But even though aircraft carriers are huge, they aren’t big enough for planes to take off or land in a normal method. The USS Dwight D. Eisenhower (CVN 69) has about 500′ for landing planes. In order for planes to be able to successfully land in that distance, it is equipped with an arresting wire system, which can stop a 54,000 lb. aircraft travelling 150 miles per hour in only two seconds and a 315′ landing area. This system consists of 4 arresting cables, which are made of wire rope coiled around hemp. These ropes are very thick and heavy and cause a significant risk to personnel safety if they are parted or detached.

This is what happened on March 18, 2016 while attempting to land an E-2C Hawkeye. An arresting cable came unhooked from the port side of the ship and struck a group of sailors on deck. At least 8 were injured, several of whom had to be airlifted off the ship for treatment. We will examine the details of this incident within a Cause Map, a visual form of root cause analysis.

The first step in any problem investigation is to define the problem. We capture the what, when, and where within a problem outline. Additionally, we capture the impacts to the goals. The injuries as well as the potential for death or even more serious injuries are impacts to the safety goal. Flight operations were shut down for two days, impacting both the mission and production/ schedule goal. The potential of the loss of or (serious damage to) the plane is an impact to the property goal. (In a testament to the skill of Navy pilots, the plane returned to Naval Station Norfolk without any crew injuries to the flight crew or significant damage to the plane.) The response and investigation are an impact to the labor goal. It’s also useful to capture the frequency of these types of incidents. The Virginian-Pilot reports that there have been three arresting-gear related deaths and 12 major injuries since 1980.

The next step in the problem-solving process is to determine the cause-and-effect relationships that led to the impacted goals. Beginning with the safety goal, the injuries to the sailors resulted from being struck by an arresting cable. When a workplace injury results, it’s also important to capture the personal protective equipment (PPE) that may have impacted the magnitude of the injuries. In this case, all affected sailors were wearing appropriate PPE, including heavy-duty helmets, eye and ear protection. This is a cause of the injuries because had they NOT been wearing PPE, the injuries would have certainly been much more severe, or resulted in death.

The arresting cable struck the sailors because it came unhooked from the port side of the ship. The causes for the detachment of the cable have not been conclusively determined; however, a material failure results from a force on the material that is greater than the strength of the material. In this case the force on the arresting cable is from the landing plane. In this case, the pilot reported the plane “hit the cable all at once”, which could have provided more force than is typical. The strength of the cable and connection may have been impacted by age or use. However, arresting cables are designed to “catch” and slow planes at full power and are only used for a specific number of landings before being replaced.

Other impacted goals can be added to the Cause Map where appropriate (additional relationships may result). In this case, the potential damage to the plane resulted from the landing failure, which was caused by the detachment of the arresting cable AND because the arresting cable is needed to safely land a plane on an aircraft carrier.

The last step of the Cause Mapping process is to determine solutions to reduce the risk of the incident recurring. More investigation is needed to ensure that the cable and connection were correctly installed and maintained. If it is determined that there were issues with the connection and cable, the processes that lead to the errors will be improved. However, it is determined that the cable and connection met design criteria and the detachment resulted from the plane landing at an unusual angle, there may be no changes as a result of this investigation.

It seems unusual that an investigation that resulted in 8 injuries would result in no action items. However, solutions are based on achieving an appropriate level of risk. The acceptable level of risk in the military is necessarily higher than it is in most civilian workplaces in order to achieve desired missions. Returning to the frequency from the outline, these types of incidents are extremely rare. The US Navy currently has ten operational aircraft carrier (and an eleventh is on the way). These carriers launch thousands of planes each year yet over the last 36 years, there have been only 3 deaths and twelve major injuries associated with landing gear failures, performing a dangerous task in a dangerous environment. Additionally, in this case, PPE was successful in ensuring that all sailors survived and limiting injury to them.

To view the outline and Cause Map of this event, click on “Download PDF” above.

Root Cause Analysis - Incident Investigation

The Force Was NOT With Them!

April 29, 2016 Jon Bernardi

By Jon Bernardi

A long time ago, in a galaxy far, far away, the Empire tried to use their fancy Death Star to keep the member systems in line. This plan did not work out very well, as Death Star One (DS-1) was not able to fulfill its mission of empowering galactic domination! DS-1 had travelled across the galaxy to quell the rebellion at the rebel base on Yavin 4, but did not count on the über-Force of the Rebel Alliance. The Empire did not realize the power of the good side of the Force as the rebels overcame all odds and were able to destroy DS-1. We can do an analysis of the incident to determine the system of causes for the destruction and show those causes visually in a Cause Map.

As much as the Emperor and his minions would not like to see this published, we begin by looking at how the Empire’s goals were impacted. We start by developing an outline of the incident. You might suspect that different factions within the Empire see this problem differently! Some don’t believe there is such a thing as “The Force” and place their faith in the power of the machine. Others use the Dark Side to exploit the mortal weaknesses of the players. The goals of the Empire are impacted in a number of ways: DS-1 is ultimately destroyed, with loss of life, and loss of a dominant-style weapon. The Rebel Alliance has gained a toe-hold against the Empire! We use the impact to the goals as the first effects of our cause-and-effect relationships and will use the disparate view of “the problem” to help us with the branches of the Cause Map.

We already know that DS-1 had planet-busting capabilities, as demonstrated convincingly at Alderaan, Princess Leia’s adopted planet. This may have led the Empire’s power structure to doubt the “Power of the Force” and put their trust in a technological titan, “The ultimate power in the universe!” Even after the plans for the station had been obtained by the Rebellion, the commander of DS-1 still disregarded any concern of vulnerability in his unsinkable marvel. In a remarkable display of hubris, the Empire allows the small band of rebels aboard the Millennium Falcon to escape with the stolen plans for DS-1. The Empire intends to follow them, find the rebel base, and wipe out the rebellion once and for all!

Another branch of the Cause Map follows the path of the stolen plans and the re-awakening of the Force on the planet Tatooine. As we analyze this section of the map, we can see the convergence of causes that led to the technical experts of the Rebel Alliance finally obtaining the plans for DS-1, analyzing them and discovering the dreaded “thermal exhaust port” – (guess even a DS has to have a tailpipe!).

Even a long time ago, we see causes in multiple areas coming together to form the overall picture of the incident. The plucky Rebellion, had THE FORCE with them!

Root Cause Analysis - Incident Investigation

Worker dies while manually measuring tank

April 15, 2016 Kim Smiley

By Kim Smiley

The potential danger of confined spaces is well documented, but nine fatalities have shown that people working near open hydrocarbon storage hatches can also be exposed to dangerous levels of hydrocarbon gases and oxygen-deficient atmospheres. NPR recently highlighted this issue in an article entitled “Mysterious Death Reveals Risk In Federal Oil Field Rules” that discussed the death of Dustin Bergsing. His job duties included opening the hatch on a crude oil storage tank to measure the level of the oil and was found dead next to an open hatch. He was healthy and only 21 years old.

A Cause Map, a visual format for performing a root cause analysis, can be used to help explain what happened to cause his death. A Cause Map intuitively lays out the cause-and-effect relationships that contributed to an issue and is built by asking “why” questions. Click on “Download PDF” to view a high level Cause Map of this accident.

So why did his death occur? An autopsy showed that his death occurred because he had hydrocarbons in his blood. This occurred because he was exposed to hydrocarbon vapor and he remained in the dangerous environment. (When two causes both contribute to an effect, they are listed vertically on the Cause Map and separated by an “and”.)

When a person is exposed to hydrocarbon vapor, they get disoriented before passing out so it is very difficult for them to get to safety on their own. Bergsing was working alone at the time of his death and no one was aware that he was in trouble before it was too late.

He was exposed to hydrocarbon gases because he opened a hatch on a crude oil storage tank and the gas had collected at the top of the tank. He opened the hatch because he planned to manually measure the tank level by dropping a rope inside. Manual tank measurement is a common method to determine level in crude oil storage tanks. Crude oil contains volatile hydrocarbons that can bubble out of the crude oil and collect at the top; the gas will rush out of the tank if a hatch is opened.

Additionally, he wasn’t wearing adequate PPE equipment because it wasn’t required by any regulations and there was limited awareness of this danger.

After his and the other deaths, the industry is starting to become more aware of this issue. The National Institute for Occupational Safety and Health (NIOSH) and the Occupational Safety and Health Administration (OSHA) issued a hazard alert bulletin that identified health and safety risks to workers who manually gauge or sample fluids on production and flowback tanks from exposure to hydrocarbon gases and vapors and exposure to oxygen-deficient atmospheres. In addition to working to raise awareness of the issue, OSHA and NIOSH made recommendations to improve working safety that include the following:

– Implementing alternate procedures that allow workers to monitor tank levels and sample without opening hatches

– Installing hatch pressure indicators

– Conducting worker exposure assessments

– Providing training on the hazard and posting hazard signage

– Not permitting employees to work alone

Please read the OSHA and NIOSH hazard alert bulletin for more information and a full list of the recommendations. Many of the recommendations would be expensive and time-consuming to implement, but some may be relatively simple ways to reduce risk. Continuing to provide information to workers about the potential hazards might be a good first step to improve their safety.

Your Expert Root Cause Analysis Resource

Category Archives: Root Cause Analysis – Incident Investigation

Plane Dive Caused by Personal Camera Results in Court-Martial

Small fire leads to thousands of canceled flights

911 Outage in Baltimore

Train Derails on Track Just Inspected

FAA Proposes Amazon Fine for Hazardous Shipment

Kansas City Interstate Overpass Closed Due to 20′ Crack

Airplane Emergency Instructions: How do you make a work process clear?

8 Injured by Arresting Cable Failure on Aircraft Carrier

The Force Was NOT With Them!

Worker dies while manually measuring tank