Root Cause Analysis - Incident Investigation

Tacoma Narrows Part 2: Failure of a Design

March 20, 2008 Kim Smiley

The mechanics behind the failure of the Tacoma Narrows Bridge were discussed in a previous blog entry. There were many design issues with the bridge and the civil engineering community has done an excellent job of studying and incorporating lessons learned from the failure. But a question that may be more pertinent across all engineering disciplines is, “Why did the design process fail?”

How did a bridge get built that would fail in a little over four months? A root cause analysis of the bridge shows that factors that shaped the doomed bridge design are present in almost every engineering project. There is as much to learn from the failed process that led to the design as there is from the failed design.

The primary factor that led to the bridge design was cost reduction. The first design proposed for the Tacoma Narrows Bridge was a conventional suspension bridge that was estimated to cost $11 million. Funding was an issue for the bridge from the beginning, and the design that was finally approved for the bridge was an elegant bridge with a narrow roadbed and short girders. In additional to being more aesthetically pleasing, the estimated price tag of $8 million dollars was nicer to look at as well. Another contributing factor is the engineer behind the second design, a very well-known civil engineer Leon Moisseiff. His credentials were impeccable, and he had previously consulted on the famed Golden Gate Bridge, the Bronx-Whitestone Bridge and others. Additionally, he helped developed some of the methods used throughout the world to calculate forces in suspension bridges.

In a tale that is probably repeating somewhere right now, a cheaper, flashier design was recommended by a well respected engineer. Nobody wanted to listen to the voices of dissension among the less well-know engineers (and there were engineers who spoke out against the new bridge design saying it was unsafe). The project then dramatically fails.

As engineers, there is a lot we can learn from studying how past projects have balanced cost and safety. There are stories where remarkable profits and success have been achieved by finding a cheaper way to do something. But sometimes, as in the case of the Tacoma Narrows Bridge, the cheap way costs more in the end.

Learn more about the failure of the Tacoma Narrows Bridge.

Root Cause Analysis - Incident Investigation

Deadly NYC Crane Accident

March 19, 2008 Angela Griffith

By ThinkReliability Staff

Unfortunately, an investigation into a deadly construction accident is currently underway in New York City. On Saturday March 15, a 19 story crane collapsed. Four construction workers were killed and 18 others were injured. Emergency workers are still sorting through the rubble in an attempt to find any remaining survivors. The crane was being used at a high-rise construction site and was attached to the side of a skyscraper. Details as to why the crane fell are still vague, but eye witnesses report that a piece of steel fell and severed at least one tie that held the crane onto the building. Once the connection between the crane and the building was weakened, the crane toppled and split into two pieces. As it fell, the crane smashed a 4 story townhouse and damaged parts of 3 other buildings.

What made the crane fall? Part of doing a root cause analysis is sorting the pertinent facts from all the information that is available. Is it relevant that neighbors had complained that the construction crews were working illegal hours and it seemed like the building was going up too quickly? City officials had issued 13 violations to the construction project, which at first glance seems like a red flag indicating a lack of attention to safety. But Mayor Bloomberg has said that this is a normal number of violations for a project this size. Additionally, the crane had been inspected on the day before the accident and no violations were issued. Did something change in 24 hours or was the inspection inadequate? At the time the crane fell, it was being raised to enable work to begin on the next floor of the building. Did this contribute to the accident? Where did the piece of steel come from that supposedly fell? At this point in the investigation there are more questions than answers.

High Level Cause Map

There are many facts and theories that surface in the wake of any accident, and part of doing a root cause analysis is determining which are actually relevant. This is a process that is much easier said than done. The push to provide answers quickly can add to the pressure to produce a “cause” for the accident. But as anyone familiar with the concept of root cause analysis knows, there isn’t a single “cause”, there are many causes that contributed to the accident. The best approach is to record all possible causes and continue to gather evidence until you can eliminate all the noise and are left with the true causes. Then the work of creating solutions that address the causes can begin.

Root Cause Analysis - Incident Investigation

Tacoma Narrows: Failure of a Bridge

March 16, 2008 Kim Smiley

By Kim Smiley

The power of performing a root cause analysis of a problem can be demonstrated by working through well-known engineering disasters. For example, creating a cause map for the failure of the Tacoma Narrows Bridge helps explain why the bridge collapsed and illustrates some of the lessons that can be learned.

The original Tacoma Narrows Bridge was opened for traffic on July 1, 1940. A little more than four months later, the bridge violently failed and a 600 foot span of roadbed fell into the river below. Why did the bridge tear itself apart? What made the bridge collapse on November 7th and not some previous day? One of the first questions asked when performing a root cause analysis is, “What is different about this issue?” The first difference to consider was that November 7th was a windy fall day. Construction of the bridge ended in the summer so this was the first fall the new bridge had experienced. On the day the bridge failed, the wind was blowing across the roadbed at 42 mph. This was the strongest the wind had blown since the bridge was constructed. The second difference was the design of the bridge itself. The Tacoma Narrows Bridge was particularly narrow relative to its length, making the roadbed more flexible than other suspension bridges. Additionally, the bridge had shallow girders and was relatively weak in torsion compared to other suspension bridges built around the same time. The combination of fall winds and the slender bridge design resulted in the collapse of the bridge.

High Level Cause Map

As the wind impacted the bridge, the force twisted the roadbed until it hit a point where it was constrained by the suspender cables, and then it twisted back in the other direction. Other suspension bridges of the time experienced similar twisting motions, but what made this bridge different was that the amplitude of the motion increased with each cycle, rather than dying out. The bridge was unable to dissipate the wind energy, and the motion of the bridge continued to grow until the twisting motion increased to the point where the suspender cables snapped and the roadbed was dropped into the river below. The mathematical explanation of why the bridge collapsed is fairly complex, but simply put: the bridge was underdamped causing the twisting oscillations to increase rather than decrease with each twisting cycle.

Learn more in Part 2 of the blog.

Root Cause Analysis - Incident Investigation

Problem Solvers are Specific

March 4, 2008 Mark Galley

By Mark Galley

Have you ever heard anyone say “the procedure is a piece of junk?” If you ask the person if every step of the 40-step procedure is wrong they will usually say “No, not every step.” You can ask them to show you which step is wrong. When they point out that step 14 is wrong, you can ask, “Is every word in step 14 wrong?” They will usually say “Well, no, not every word, but that 5 is supposed to be a 7. You can then say “I understand. That is an issue. Thanks for catching that. I’ll get it updated. These things have got to be clear and accurate.”

The original statement “the procedure is a piece of junk” is too general. It refers to the procedure as one thing, not 40 things. People that blame and complain speak in very general terms. They group things together and generalize. People that are very good at troubleshooting and solving problems naturally think and speak in very specific terms. Analyzing a problem is about breaking a problem down into parts. Analyzing problems is always about getting more specific so that very specific actions (the solutions) can be taken.

Terms like “human error”, “procedure not followed” and “training less than adequate” are used regularly by companies to explain why a particular problem occurred. These terms are too general. They inadvertently give the impression that the cause has been found during their root cause analysis. Knowing that someone didn’t follow a procedure is important, but is not the end of an investigation. We’re just getting to the good stuff. We’re just getting the specific information that created the incident in the first place.

Our interest is not limited to fixing that person that didn’t follow that procedure. We want to address how we developed, approved, utilized and updated this particular procedure so that the procedure process can be improved. It’s about improving how we capture and communicate the best work practices in our organization as a whole. This is the leverage within the organization. To solve problems effectively be specific. Ask those who blame and complain to help us understand the issue by being more specific.

For more information about improving the problem solving skills within your organization, visit ThinkReliability – specializing in Cause Mapping – Effective Root Cause Analysis training.

Root Cause Analysis - Incident Investigation

UPDATE: US Beef Recall

February 26, 2008 Kim Smiley

By Kim Smiley

I wanted to add a few more interesting facts on the recent beef recall as the ramifications continue to surface. As a quick recap, on February 17, 143 millions pounds of beef were recalled. For perspective, that’s enough beef to make every person in the US about two hamburgers. The scope of the recall is rapidly expanding and it may become the largest food recall in US history. The full magnitude of the recall is just now becoming apparent because it takes weeks to track down all the products containing the recalled beef.

Take a second to think of all the products in a grocery store that contain beef and you can imagine how large this recall is likely to become. The amount of food that is going to be destroyed is mind boggling and the cost is likely to be in the hundreds of millions of dollars. Keep in mind that no cases of illness have been reported, a large amount of the beef has already been consumed, and the U.S. Department of Agriculture classifies the risk to consumers as remote. Does it make sense to destroy all this food? As you consider the scope of the recall, I ask you also to consider a root cause analysis of the problem.

The previous blog asked the question, what is the best approach to prevent this type of problem from happening again? I still don’t now the answer, but I do know that a recall alone does not solve the initial problems that caused the issue. What cause really lead to sick cows being mistreated and then slaughtered for human consumption? A recall deals with the problem after the fact and a good solution would change something in the process prior to the meat entering the food chain. The USDA has stated that it will not be increasing inspections at food processing plants and I haven’t found any evidence that other changes are being made in the work process at the slaughterhouses. I’ll be continuing to cook my meat well done.

Root Cause Analysis - Incident Investigation

Largest Beef Recall in US History

February 22, 2008 Kim Smiley

By Kim Smiley

One of the most interesting things about root cause analysis is its widespread application. As an engineer, I tend to think about root cause analysis applying to mechanical failures, safety incidents or manufacturing issues, but it can be applied to any system.

Take for instance the recent beef recall. The largest beef recall in US history was initiated on February 17 when Westland/Hallmark Meat Company recalled 143 million pounds of beef. What started the whole thing was an undercover video distributed by the Humane Society of the United States which showed workers kicking, shocking and even fork-lifting sick cows to force them on their feet so they could be slaughtered. Beyond the animal cruelty issues (two workers involved have since been charged), the issue is that meat from sick cows was processed and sold. Government regulations ban cows that can not walk from entering the food supply because consumption of their meat may lead to illness, including mad cow disease.

So how did sick cows end up being slaughter and sold to millions of people? What is the best approach to prevent this type of problem from happening again? Is the answer that we need more government regulations, more frequent inspections or stricter penalties for companies that violate the current regulations? Whose fault is it? Is it the farmers for selling the cows, the health inspectors for missing sick cows or the slaughterhouses for processing sick cows? Performing a root cause analysis would show you that there isn’t one right single answer. All you have to do is look at the recent increase in beef recalls to realize that a simple, single cause solution won’t work. There were five recalls in 2005, eight in 2006 and 21 recalls in 2007. These recalls were not limited to one plant or even one company. Clearly, fining one company or firing a few workers isn’t going to fix the beef supply issues. You need to attack the root of the problem to keep it from growing back and to do that you need to find the root causes (plural). The information needed to do a detailed analysis isn’t available to the public yet, but a very basic root cause analysis follows. High Level Cause Map

Root Cause Analysis - Incident Investigation

Goals Define the Problems in your Organization

February 19, 2008 Mark Galley

By Mark Galley

For a particular failure, loss or incident, people will naturally disagree about what the problem is. Some people will say the problem is this and others will say the problem is that and still others will let everyone know what the real problem is. People see problems differently. This is a given for any root cause analysis facilitator.

Is it possible for everyone to agree on the problem? Yes. It may seem unrealistic until we look specifically at what a problem is. A problem is anything that negatively affects the ideal state. People may see many different issues as a problem, but within an organization the ideal state is already defined. The ideal state within an organization is also known as the overall goals. Any negative deviation from the organization’s overall goals is the accurate, complete and consistent approach for defining a problem. For example, let’s consider your local power plant.

What is the ideal state of that power plant? Let’s say the power plant is supposed to produce 1000 megawatts per day. Any negative deviation from 1000 megawatts is a problem. If the plant produced 900 megawatts then the deviation is 100 megawatts (a production loss). We could even put an economic value on this production loss. But producing power is not the only goal of the power plant. Organizations don’t have a goal. They always have goals (plural).

The safety goal for the power plant is zero injuries. Any injury is a deviation from the ideal state. Some safety incidents are more critical than others. The larger the magnitude of the impact to the goals the more thorough the investigation is. A paper cut is an injury, but it’s not as serious as someone receiving 15 stitches. Some problems are bigger than others. The magnitude of the impact on the goals dictates importance as well as how thorough the investigation will be. Minor incidents have relatively basic investigations while major issues require much more comprehensive analyses.

The ideal state of the power plant also includes no environmental issues as well as no customer service interruptions, no property or material losses, and no excess reactive or rework labor costs. The overall goals of the power plant are safety, environmental, compliance, customer, production, and materials and labor (which are usually captured within maintenance). Any negative deviation to any one of these overall goals is truly what the power plant should focus on for their problem solving and root cause analysis efforts…everyday.

The overall goals change for each type of organization. A hospital has different overall goals than a food processor, an oil company or a bank. Regardless of the organization or industry, the impact to the overall goals dictates where the root cause analysis efforts should be.

The Cause Mapping method to root cause analysis has a specific way of defining every problem by the organization’s overall goals. People naturally disagree about what the problem is. In the Cause Mapping method of root cause analysis it’s much simpler for the facilitator to accommodate disagreements about the problem – it’s expected. The differences provide great insight into people’s view of the problem. To get agreement, ask the participants, as a group, how each of the overall goals were impacted (if at all). Amazingly, people will not disagree about the impact to the goals. They will disagree about the responses to the question “What’s the problem?” However, they will give the same answers to each of the goal questions. Managers and front line people will give the same answers. It’s powerful because it’s so basic. Goals dictate what the problems are.

During an injury investigation in the power plant where someone sprained their ankle, when the facilitator asks “Was anyone hurt?” everyone will answer with “yes, John sprained his ankle.” It’s obvious. If you ask what the problem, people’s responses will be all over the place; he just tripped, the barrier is bad, maybe the floor was slick, inattention to detail, procedure not followed, etc. In your problem solving and root cause analysis investigations experiment with this idea of defining every incident by the impact it has on the overall goals.

To learn more about quickly, clearly and accurately defining problems in your business attend one of our Public Cause Mapping Workshops listed on our web site or bring our workshop to your facility. The Cause Mapping method is an extremely effective systems-based approach to root cause analysis. Visit us at www.ThinkReliability.com to learn more about improving the way your organization analyzes, documents, communicates and solves problems.

Root Cause Analysis - Incident Investigation

Root Cause Analysis

February 9, 2008 Charles Baldo

ThinkReliability investigates problems, including historical incidents. Some examples of these incidents include, but are not limited to, the sinking of the Titanic, the Tacoma Narrows Bridge, the Exxon Valdez oil spill and the BP Refinery Explosion in Texas City. The Cause Mapping method of root cause analysis was used to create a visual picture of the cause and effect relationships of the incidents.