How an Unchecked Assumption Brought Down a Bridge

By ThinkReliability Staff

On August 1, 2007, the I-35 bridge over the Mississippi River in Minneapolis, Minnesota collapsed during evening rush hour, killing 13 and injuring at least 145.  During the National Transportation Safety Board’s investigation, it was discovered that the gusset plates (the riveted metal plate that joins several structural members) were designed with inadequate load capacity.  At the time of the bridge collapse, the load on the gusset plate that failed was higher than usual, due to construction materials and equipment concentrated on the deck over the location of the gusset plate and rush hour traffic slowed by the construction.  In addition to these weights, the dead load (weight of the bridge structure) had increased by more than four million pounds due to improvements made to the bridge since it opened in 1967.

Bridges are inspected regularly, and go through a design review process . . . so how did the gusset plate design error get missed?    The design for the gusset plates was apparently supposed to be a preliminary design, which neglected shear stress.  Although the firm that designed the bridge required a review of all calculations before the final design, the procedure did not ensure that all calculations were rechecked, so the gusset plate calculations that ignored shear stress were overlooked.

The design was reviewed by the government, but their design review did not apply to gusset plates.  The gusset plate capacity was not calculated as part of the load rating calculations.   Gusset plates were not listed as a separate element to be inspected during a bridge inspection.  And, the training for bridge inspectors continued very little information about gusset plates.  Why?  Because it was widely assumed that gusset plates are stronger than the members they join and so can be neglected in calculations in order to simplify the analysis.  In most cases, this assumption is true.  However, since the gusset plates were designed incorrectly, and so were much weaker than typical, allowing this assumption to go unchecked, on several different occasions, proved disastrous.

Thanks to this tragedy, it’s unlikely the same problem will happen again.  Structural design and bridge inspection training material is being rewritten to include the lessons learned from this bridge collapse, and inspections are now considering the strength of gusset plates as part of their evaluation.  Assumptions are made all the time, but these assumptions need to be verified.

Click on download PDF to see the NTSB’s root cause analysis investigation results visually displayed in a Cause  Map.  A  Cause Map can capture all of the causes from an investigation in a simple, intuitive format that fits on one page.

Click here for another example of a case where a minor item caused some major issues.

Learn more about the I-35 Bridge collapse.

Confined Space Asphyxiation

By ThinkReliability Staff

During the overnight shift on November 5, 2005, two workers at a refinery in Delaware City, Delaware died from asphyxiation.  Both workers had entered a confined space that was filled with nitrogen.  We will use information from the Chemical Safety Board’s root cause analysis investigation to create a Cause Map.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.

The first step in our analysis is to define the problem by filling out the outline.  The outline contains the what, when, where and impact to the goals.  The “what” is the problem; in this case two workers were asphyxiated.  The when is the overnight shift of November 5, 2005, and the where is the hydrocracker reactor of a Delaware City refinery.  The workers were apparently attempting to retrieve dropped tape.

Because two workers were killed, there was an impact to the safety goal.  There may have been impacts to other goals as well, but the loss of life makes other impacts less significant.

Once the outline is completed, we use the impacted goal to begin the Cause Map.  We begin with the impacted goal and ask ‘why’ questions.  A good way to begin is using the “5-why” technique.  Begin with the impacted goal and ask “why” 5 times.  This will start the Cause Map.  For this incidence: the safety goal was impacted. Why? Because two workers died.  Why? Because they were asphyxiated.  Why? Because they entered a confined space.  Why?  They were attempting to retrieve lost tape.  Why?  Because the tape was left in the reactor.

From the “5-why” Cause Map we can add more detail to the root cause analysis.  Additional causes can be added before, after and between the causes on the 5-why map.  For example, the workers were asphyxiated because they entered the confined space AND the space was filled with nitrogen.  The space being filled with nitrogen is added as an additional cause of asphyxiation, and is joined with “AND” because both causes had to be present for the asphyxiation to occur.

Even more detail can be added to this Cause Map as the root cause analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals. The outline, “5-why” Cause Map and detailed Cause Map can be seen by clicking “Download PDF” above.

Yellow Fever Epidemic

By ThinkReliability Staff

With swine flu in the news lately, ‘epidemic’ has been on many minds. However, there is still much that isn’t understood about swine flu. There are other epidemics that we understand much better, such as yellow fever.  Yellow fever has been causing epidemics for a long, long time.

But how does it happen?  We can do a root cause analysis of a yellow fever epidemic to find out.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.

Since we are not looking at a specific event, but rather a general situation we will start with just one impacted goal. A yellow fever epidemic can result in the deaths of thousands of people, which we will consider an impact to the safety goal.

We begin the root cause analysis with this impacted goal and ask “why” questions.  Several thousand people may die because there is no cure for yellow fever, it has a high mortality rate, and several thousand people get infected.  The people get infected because they’re not vaccinated, and they are bitten by an infected mosquito in the epidemic zone.  (The endemic zone is areas of Africa and South America where a low level of yellow fever is always present.  The epidemic zone is an area outside the endemic zone to where yellow fever is spread and an epidemic occurs.)

People are not vaccinated because they don’t have access to the vaccine: either it costs too much, or the area is to isolated to receive vaccine. In order for someone to get bit by an infected mosquito in the epidemic zone, the mosquito must be infected, and the person must have been exposed to a mosquito in the epidemic zone.  In order for a person to be exposed to a mosquito, the mosquito must have access to a person, and mosquitoes must exist, meaning they are able to breed, meaning breeding pools exist.

A mosquito gets infected by biting a person infected with yellow fever. For yellow fever to spread from the endemic zone to the epidemic zone, this means a person was infected with yellow fever in the endemic zone,
and traveled to the epidemic zone.  The person gets infected with yellow fever by being bitten by a mosquito infected with yellow fever (in the endemic zone) without being vaccinated.  The person gets bitten by an infected mosquito because they are exposed to mosquitoes (for the same reasons listed above) that are infected, usually by biting monkeys who have been infected by yellow fever.

If you had trouble following all of that, you can see why a process map would be helpful.  On the downloadable PDF, both the Cause Map and process map are shown.

Pedestrain Bridge Collapse on July 4th

Download PDFBy ThinkReliability Staff

On the evening of July 4th, after watching fireworks, revelers at a park in Merrillville, Indiana headed back to their cars over a pedestrian bridge.  The bridge became overloaded and collapsed when two suspension cables snapped.  Somewhere between 50 and 120 people fell into the lake.  Although 25 were treated for injuries, nobody was killed, thanks to quick action by nearby lifeguards, police officers, firefighters and other rescuers who formed a human chain to help get everyone safely out of the water.  We’ll use this as an root cause analysis example.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.First we complete the outline.  The problem is a bridge collapse.  It happened at 10:00 p.m. on the 4th of July, while there were large numbers of people on the bridge.  It was a pedestrian bridge in Merrillville, IN, and people were crossing it to return home after a party.

Once we have defined the problem we list the impacts to the goals.  People being injured is an impact to the safety goal, as is the potential for drowning.  People fell into the lake, which was an impact to the customer service goal.  Additionally, the loss of the bridge is an impact to the material and labor goal.

We begin our Cause Map by listing the impacted goals and asking “why” questions to fill out the Cause Map to the right.  Begin with 5 “why” questions to start the Cause Map.  This is known as the “5-whys” technique.  For example, the safety goal was impacted.  Why? The safety goal was impacted because people were injured.  Why? People were injured because they fell into the lake.  Why?  They fell into the lake because the bridge collapsed. Why?  The bridge collapsed because the suspension cables broke.  Why? The cables broke because the weight on the bridge exceeded the bridge capacity.

Even more detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.  For this investigation, we can add some more detail to the “5-why” Cause Map to help our investigation.  For example, pedestrians fell into the lake because the bridge collapsed AND because pedestrians were on the bridge, returning to their cars after the 4th of July party.

There may have been additional stress on the bridge due to pedestrians jumping up and down, as reported by witnesses.  Additionally, we can add more detail after the “weight exceeded capacity” on the bridge.  The bridge was built to hold 40 people, but “at least twice that” were on the bridge when it collapsed.  So many people were on the bridge because they were returning to their cars (as discussed above), and because of ineffective crowd control.  There were too many people on the bridge despite officers stationed on either side.  Why was the crowd control ineffective?  It’s not known at this point, but we’ll put a question mark here.  The next step of the investigation will be to replace that question mark with reasons for the ineffective crowd control.  Once we’ve done that, we can come up with solutions that will keep an event like this one from occurring in the future.

Italian Train Explosion

Download PDFDownload PDFBy ThinkReliability Staff

On the evening of June 29, a train carrying liquefied natural gas derailed and exploded in the town of Viareggio, in western Italy.  Search and rescue operations are still ongoing, and the cause for the derailment is not yet known.  Although that means we are lacking some information, we can still begin our root cause analysis investigation, in the form of a Cause Map.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.

The benefit to beginning a root cause analysis investigation before all the information is known is to provide a framework for the investigation to build on.  People find it much easier to comment on a partially finished Cause Map than to piece together the investigation from scratch.

A root cause analysis template is available for download from the Think Reliability web page to assistant with the investigation.  The first step is to fill out the outline.  Don’t leave any blanks in the outline; if you don’t know something, put a question mark.  The first line is the ‘what’ or the problem.  Rather than spending time debating what ‘the problem’ is, we can put a number of things.  For example, the problem here could be defined as a gas leak, an explosion, and a train derailment.  We put all these things on the problem line.  The rest of the information is known, though we may add more detail later, except for differences.  Differences can be key to an investigation.  For example, if you have a process that works for 30 straight sunny days, then fails the day it rains, it is worth looking into the impact of the rain on the process.  Here, no differences are immediately coming to mind, so we’ll put a question mark in this blank.

Once we’ve defined the problem, we can define the problem with respect to the impact to the goals.  We don’t know how many people, overall, were killed or injured, but we can just put “at least” to show that the numbers aren’t exact.  We know that the environmental goal was impacted, because of the gas leak, the community goal was impacted because of the required evacuation, and the material/labor goal was impacted because of the collapsed houses, and the damage to the train.

Now we begin the analysis.  We begin with the impacted goals and ask “why” questions, moving to the right.  When we can’t answer the “why” question, we can use a question mark, or put some possibilities (theories) that have been presented.  For example, we’re not yet sure why the train derailed.  Some of the possibilities that have been presented are damage to the tracks, a problem with the braking system, or malfunctioning wagon locks.  The Cause Map (so far) is shown in the downloadable PDF (to download, click “Download PDF” above.)  As you can see, there is a lot of information present, even though we don’t know all of what happened yet.

As more information is available, we can update the Cause Map.  As with any root cause analysis, the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals.