Category Archives: Root Cause Analysis – Incident Investigation

300,000 Unable to Use Water after Chemical Spill in West Virginia

By Kim Smiley

Hundreds of thousands of West Virginians were unable to use their water for days after it was contaminated by a chemical spill on January 9, 2014. About 7,500 gallons of 4-methyl-cyclohexane-methanol, known as MCHM, leaked out of a storage tank and into the Elk River.   At the time of the spill, little information was known about MCHM, but officials ordered residents not to use the use the water because the chemical can cause vomiting, nausea, and skin, eye and throat irritation.  The ban on water usage obviously meant that residents should not drink the water, but they were also told not to cook, bathe, wash clothes or brush their teeth with it.

The investigation into this incident is still ongoing, but some information is available.  An initial Cause Map, or visual root cause analysis, can be built now and it can easily be expanded in the future.  A Cause Map is used to illustrate the cause-and-effect relationships between the many causes that contribute to any incident.  In this example, it is known that the MCHM leaked into the river because it was being stored in a tank near the river and the tank failed.  MCHM was being stored in a tank because it is used in coal processing and it was profitable for the company to sell it.

The cause of the tank failure hasn’t been officially determined, but the company who owned the facility has stated that an object punctured the tank after the ground under the tank froze.  (Suspected causes can be included on the Cause Map with a question mark to indicate that more evidence is needed to confirm their validity.)

The tank in question was older, built about 70 years ago.  There were no regulations that required the tank to be inspected while it was being used to store MCHM because the chemical is not currently legally considered a hazardous material.  The tank is also an atmospheric tank so it is exempt from current federal safety inspections because it is not under pressure, cooled or heated.

Many are asking questions about why a tank full of a chemical that can make people sick that was so close to the water supply had so little regulation and no required inspections.  The debate that has been sparked by this accident will force a close review of current regulations governing these types of facilities.

It’s also alarming how little was known about this chemical prior to this accident.  It’s still not well understood exactly how dangerous MCHM is.  Experts have stated that the long term impacts should be minimal, but it would be awfully reassuring to the people living in the area if there was more information about the chemical available.

Companies need to have a clear understanding of the risks involved in their operations if they hope to reduce the risk to the lowest reasonable level and develop effective emergency response plans to deal with any issues that do arise.  As the old saying goes – failure to plan is planning to fail.  Just ask the company involved.  Freedom Industries filed bankruptcy papers on January 17, 2013 as a direct result of this accident.

Improper Fireplace Installation Results in Firefighter’s Death

By Mark Galley

While battling a fire in a mansion in Hollywood Hills, California on February 16, 2011, a firefighter was killed (and 5 others seriously injured) when the roof collapsed.  As a result of the firefighter’s death, the owner/ architect of the home was convicted of involuntary manslaughter.  He is scheduled to serve 6 months and then will be deported.

The fire wasn’t arson, but the owner/ architect was considered responsible due to the installation of an outdoor-only fireplace on the top floor of his home.  Because of the legal issues surrounding this case, it’s important to carefully determine and clearly present all of the causes that led to the fire and the firefighter’s death.

We can capture information related to this issue within a Cause Map, or visual root cause analysis.  A Cause Map begins with the impacted goals, allowing a clear accounting of the effects from the issue.  The firefighter’s death is an impact to the safety goal, as are the injuries to the other firefighters.  Impacts to the safety goal are the primary focus of any investigation, but we will capture the other impacted goals as well.  In this case, the regulatory goal was impacted due to the non-compliant fireplace, the non-compliance being missed during inspection, and the prison sentence for the architect/owner.  Additionally the loss of the home and the time and effort put into firefighting and the subsequent trial impact the property and labor/time goals.

Once the impacts to the goals are determined, asking “why” questions begins to develop the cause-and-effect relationships that resulted in those impacts.  A Cause Map can start simple – in this example, the safety goal was impacted due to the death of a firefighter.  Why? Because the ceiling collapsed.  Why? Because the house was on fire.  Why? Because heat ignited flammable building materials.

Though this analysis is accurate, it’s certainly not complete.  More detail can be added to the Cause Map until the issue is adequately understood and all causes are included in the analysis.  Detail can be added by asking more “why” questions – the heat ignited flammable building materials because an outdoor-only fireplace was improperly used inside the house.  Causes can also be added by considering causes that both had to occur in order for the effect to happen.  The firefighter was killed when the ceiling collapsed AND the firefighter was beneath the ceiling, fighting the fire.  Had the ceiling collapsed but the firefighters not been inside, the firefighter would not have been killed by the ceiling collapse.

Detail can also be added between causes to provide more clarify.  In this case, the ceiling collapse was not directly caused by high heat.  Instead, the high heat activated and melted the sprinkler system, resulting in a buildup of water that caused the ceiling collapse.  The other goals that were impacted should also be added to the Cause Map, which may result in more causes.  In this case, the improperly installed fireplace was missed by the building inspector, which is an impact to the regulatory goal.  The reason it was missed was debated during the trial, but changes to the inspection process may result that would make this type of incident less likely, ideally reducing the risk to firefighters and home owners.

An incident analysis should have enough detail to lead to solutions that will reduce the risk of recurrence of the impacted goals.  As I mentioned previously, solutions from the perspective of the building inspectors may be to look specifically for issues on fireplaces that could lead to these types of fires.  Ideally, a way to determine if a sprinkler system was malfunctioning and leading to water collection could be developed that could reduce the risk to firefighters.  For homeowners, this incident should stand as a reminder that outdoor-only heat sources such as fireplaces are outdoor-only for a reason.

Freight Train Carrying Crude Oil Explodes After Colliding With Another

By Kim Smiley

On Monday, December 30, 2013, a 106-car freight train carrying crude oil derailed in North Dakota and violently exploded after colliding with another derailed train that was on the tracks.  No injuries were reported, but the accident did cause an impressive plume of hazardous smoke and major damage to two freight trains.

The investigation into the accident is ongoing and it’s still unknown what caused the first train to derail. Investigators have stated that it appears that there was nothing wrong with the railroad track or with the signals.  It is known that a westbound freight train carrying grain derailed about 2:20 pm.  A portion of this train jumped onto the track in front of the eastbound train.  There wasn’t enough time for the mile long train loaded with crude oil to stop and it smashed into the grain train, causing the eastbound oil train to derail.  (To see a Cause Map of this accident, click on “Download PDF” above.)

Train cars carrying crude oil were damaged and oil leaked out during the accident.  The train accident created near ideal conditions for an explosion: sparks and a large quantity of flammable fluid.   The fire burned for more than 24 hours, resulting in a voluntary evacuation of nearby Casselton, North Dakota due to concerns over air quality.  The track was closed for several days while the initial investigation was performed and the track was cleaned up.

The accident has raised several important issues.  The safety of the train cars used to transport oil has been questioned.  Starting in 2009, tank train cars have been built to tougher safety standards, but most tank cars in use are older designs that haven’t been retrofitted to meet the more stringent standards.  This accident, and others that have involved the older design tank cars in recent year, have experts asking hard questions about their safety and whether they should still be in use.

The age of the train cars is particularly concerning since the amount of oil being transported by rail has significantly expanded in result years.  Around 9,500 carloads of oil were reportedly transported in 2008 and nearly 300,000 carloads were moved during the first three quarters of 2013.  The oil industry in North Dakota has rapidly expanded in recent years as new technology makes oil extraction in the area profitable.   North Dakota is now second only to Texas in oil production since the development of the Bakken shale formation.  Pretty much the only way to transport the crude oil extracted in North Dakota is via rail.  There isn’t a pipeline infrastructure or other alternative available.

And most of the time, transporting oil via freight train is a safe evolution.  The Association of American Railroads has reported that 99.99 percent of all hazardous materials shipped by rail reach the destination safely.  But it’s that 0.01 percent that can get you in trouble.  As a nation, we have to decide if where we are at is good enough or if it’s worth the money to require all tank cars used to transport oil to be retrofitted to meet the newest safety standards, a proposition that isn’t cheap.

Ceiling Collapse in London’s 112-year-old Apollo Theatre Injures Dozens

By Kim Smiley

On December 19, 2013, 76 people were injured when a large section of plaster fell from the ceiling of London’s historic Apollo Theatre.  Luckily there were no fatalities as a result, but six people were seriously injured in the accident.

The investigation is still underway, but an initial Cause Map can be built to begin analyzing the incident.  The first step in the Cause Mapping process is to fill in an Outline with the basic background information as well as formally list how the incident impacts the goals so that no part of a multifaceted problem is neglected.  It’s important to understand how an issue impacts all goals, such has safety issues, financial considerations, schedule delays, etc. There are times when different solutions can help mitigate risks to separate goals so it is useful to list all impacted goals for clarity.   Listing the impacted goals will also help focus the investigation on the most important elements.

Another very important part of the Outline is a space where any relevant differences are listed.  Anything that was different at the time an incident occurred is usually a good place to start digging during an investigation.  For this example, there was heavy rain during the hour preceding the ceiling collapse.  It’s also worth noting that the Apollo Theatre is 112 years old.

Investigators have not announced what led to the ceiling collapse, but early speculation is that rain water leaked through the roof and settled onto the plaster.  The theory is that the additional weight from the water was more than the ceiling could handle and it fell, taking a lighting rig and part of a balcony with it.   If this was the case, there will need to be hard questions asked about the adequacy of current building codes and inspection requirements.  Currently, the roof on the Apollo Theatre was only required to be inspected every 3 years.  It appears that the Theatre was up to date on and had passed all required inspections so the required periodicity may need to be re-evaluated in light of the recent failure.

Any suspected causes that haven’t been proven yet can be included on the Cause Map, but are marked with a “?” to indicated that they need additional evidence.  This helps document what has been considered during an investigation and questions that still need to be answered.

To view an Outline and the initial Cause Map of the Apollo Theatre ceiling collapse, click on “Download PDF” above.

 

Department of Energy Cyber Breach Affects Thousands, Costs Millions

By ThinkReliability  Staff

Personally identifiable information (PII), including social security numbers (SSNs) and banking information, for more than 104,000 individuals currently or formerly employed by the Department of Energy (DOE) was accessed by hackers from the Department’s Employee Data Repository database (DOEInfo) through the Department’s Management Information System (MIS).  A review by the DOE’s  Inspector General in a recently released special report analyzes the causes of the breach and provides recommendations for preventing or mitigating future breaches.

The report notes that, “While we did not identify a single point of failure that led to the MIS/DOEInfo breach, the combination of the technical and managerial problems we observed set the stage for individuals with  malicious intent to access the system with what appeared to be relative ease.”  Because of the complex interactions between the systems, personnel interactions and safety precautions (or lack thereof) that led to system access by hackers, a diagram showing the cause-and-effect relationships can be helpful.  Here those relationships – and the impacts it had on the DOE and DOE personnel – are captured within a Cause Map, a form of visual root cause analysis.

In this case, the report uncovered concerns that other systems were at risk for compromise – and that a breach of those systems could impact public health and safety.  The loss of PII for hundreds of thousands of personnel can be considered an impact to the customer service goal.  The event (combined with two other cyber breaches since May 2011), has resulted in a loss of confidence in cyber security at the Department, an impact to the mission goal.  Affected employees were given 4 hours of authorized leave to deal with potential impacts from the breach, impacting both the production and labor goals.  (Labor costs for recovery and lost productivity are estimated to cost $2.1 million.)  The Department has paid for credit monitoring and established a call center for the affected individuals, at an additional cost of $1.6 million, leading to a cost of this event of $3.7 million.  With an average of one cyber breach a year for the past 3 years, the Department could be looking at multi-million dollar annual costs related to cyber breaches.

These impacts to the goals resulted from hackers gaining access to unencrypted PPI.  Hackers were able to gain access to the system, which was encrypted, and contained significant amounts of PPI, as this database was the central repository for current and former employees.  The PPI within the database included SSNs which were used for identifiers, though this is contrary to Federal guidance.  There appeared to have been no effort to remove SSNs as identifiers per a 5-year-old requirement for reasons that are unknown.  Reasons for the system remaining unencrypted appear to have been based on performance concerns, though these were not well documented or understood.

Hackers were able to “access the system with what appeared to be relative ease” because the system had inadequate security controls (only a user name and password were required for access), and could be directly accessed from the internet, presumably in order to accomplish necessary tasks.   In the report, ability to access the system was directly related to “continued operation with known vulnerabilities.”  This concept may be familiar to many at a time when most organizations are trying to do more with less.   Along with a perceived lack of authority to restrict operation, inability to address these vulnerabilities based on unclear responsibility for applying patches, and vulnerabilities that were unknown because of the limited development, testing, troubleshooting and ongoing scanning of the system, cost was also brought up as a potential issue for delay in addressing the vulnerabilities that contributed to the system breach.

According to the report, “The Department should have considered costs associated with mitigating a system breach … We noted the Department procured the updated version in March 2013 for approximately $4,200. That amount coupled with labor costs associated with testing and installing the upgrade were significantly less than the cost to mitigate the affected system, notify affected individuals of the compromise of PII and rebuild the Department’s reputation.”

The updated system referred to  was purchased in March 2013 though the system had not been updated since early 2011 and core support for the application upon which the system was built ended in July 2012.  Additionally, “the vulnerability exploited by the attacker was specifically identified by the vendor in January  2013.”  The update, though purchased in March,  was not installed until after the breach occurred.  Officials  stated that a decision to upgrade the system had not been made until December 2012, because it had not reached the end of its useful life.”  The Inspector General ‘s note about considering costs of mitigating a system breach is poignant, comparing the several thousand dollar cost of an on-time upgrade to a several million dollar cost of mitigating a breach.   However, like the DOE, many companies find themselves in the same situation, cutting costs on prevention and paying exponential higher costs to deal with the inevitable problem that will arise.

To view the Outline, Cause Map and recommended solutions based on the DOE Inspector General’s report, please click “Download PDF” above.  Or click here to read more.

Poisoned Dead Mice Parachuted onto Guam to Kill Snakes

By Kim Smiley

On December 1, 2013, 2,000 dead, poisoned neonatal mice were parachuted onto Guam on a unique mission to fight an invasive species, the brown tree snake. The parachutes are designed to catch in the trees and tempt the snakes, who live in the trees, into eating the mice. The mice are pumped full of acetaminophen, a chemical that the snakes are particularly sensitive to because it affects their blood’s ability to carry oxygen.

There are an estimated 2 million brown tree snakes on Guam so the 2,000 poisoned mice will only impact a very small percentage of the population, but scientists hope that the information they learn from this drop will help them plan larger mice drops in the future.  This is the fourth and largest dead mice drop so far and cost 8 million dollars.  Some of the mice were embedded with data-transmitting radios for this drop which will allow scientists to better gauge the effectiveness of this technique.

While the 8 million dollar price tag sounds high, it’s important to realize that the damage done by the brown tree snakes each year is significant.  Since their accidental introduction to the island, brown tree snakes have destroyed the native ecosystem, decimating the native bird population.  Brown tree snakes are also fantastic climbers and they routinely get into electrical equipment.  They cause an average of 80 power outages a year, resulting in costs as high as $4 million for repairs and lost productivity annually. (See our previous blog for more information.)

Even through the problem of the brown tree snakes is fairly well understood, an effective solution has been difficult to find.  There have been a number of different things tried over the years: snake traps, snake-sniffing dogs and snake-hunting inspectors have all been used, but the snakes have completely over un the island.  As farfetched as it sounds, parachuting dead mice seems to be the most promising solution at present.  It works because the snakes are very sensitive to acetaminophen; they only need to ingest about one-sixth of a standard pill for it to be effective.  This means that non-target animals are unlikely to be heavily impacted by the mice drops.  A pig or dog would need to eat around 500 of the baited mice for the dose to be lethal. One of the concerns is that snakes tend to avoid prey that is already dead, but information from the radio transmitters used in the recent drop should confirm if the mice are an effective bait.

One thing I know for sure, I would have loved to be in the brainstorm meeting the first time someone suggested parachuting dead mice.  This example is a good reminder to all of us to keep an open mind.  Every now and then, the most bizarre solution suggested turns out to be the best.

Metro Train Derails in the Bronx, Killing 4 and Injuring More Than 60

By Kim Smiley

Four passengers were killed and dozens more sent to the hospital after a metro train derailed in the Bronx early Sunday, December 1, 2013.  At the time of the accident, the train was carrying about 150 passengers and was traveling to Grand Central Terminal in New York City. The aftermath of the accident was horrific with all seven cars of the commuter train derailing. Metro-North has been operating for more than 30 years and this was the first accident that resulted in passenger deaths.

A Cause Map, or visual root cause analysis, can be built to help analyze this accident.  There is still a lot of investigative work that needs to be done to understand what caused the derailment, but the information that is available can be used to create an initial Cause Map.  The Cause Map can easily be expanded later to incorporate more information as it becomes available.  The first step when building a Cause Map is to fill in an Outline with the basic background information.  The impacts to the goals are also documented on the bottom of the Outline.  The impacted goals are then used to begin building the Cause Map.

In this example, the safety goal is clearly impacted because there were four fatalities and over 60 people injured.  The schedule goal is also significantly impacted because this portion of rail will be closed during most of the investigation.  The National Transportation Safety Board has estimated that the investigation will take 7 to 10 days.  The track closure is particularly impacting because this is a major artery into New York City with a ridership of 15.9 million in 2012.  Once the impacted goals are documented, the Cause Map itself is built by asking “why” questions.

So why did the train derail?  The details aren’t known yet, but there is still some information that should be documented on the Cause Map.  A question mark is included after a cause that may have contributed to an issue, but requires more evidence or investigation.  It’s useful to document these open questions during an investigation to ensure that all the pertinent questions are asked and nothing is overlooked.  (If it is determined that a cause didn’t play a role, it can be crossed out on the Cause Map to show that the cause was considered, but ruled out.)  Two factors that likely  played a role in the derailment are the speed of the train and the track design where the accident occurred.  There is a sharp curve in the track where the derailment happened.  Trains are required to reduce their speed before traveling it.  The latest reports from the investigation are that the train was traveling 82 mph in a 30 mph zone. The train operator has stated that the brakes malfunctioned and didn’t respond when he tried to reduce speed and that the train was traveling too fast over the curved track.

Investigators have recovered the data recorder from the train which will provide  more information and if there was a problem with the brakes.  Investigators will also interview all the relevant personnel and determine what happened to cause this deadly crash.  Once the investigation is completed, any necessary solutions can be implemented to reduce the risk that a similar accident occurs in the future.

To view a completed Outline and initial Cause Map of this incident, click on “Download PDF” above.

Boeing 747 “Dreamlifter” Cargo Jet Lands At Wrong Airport

By Kim Smiley

On November 21, 2013, a massive Boeing 747 Dreamlifter cargo jet made national headlines after it landed at the wrong airport near Wichita, Kansas.  For a time, the Dreamlifter looked to be stuck at the small airport with a relatively short runway, but it was able to take off safely the next day after some quick calculations and a little help turning around.

At the time of the airport mix-up, the Dreamlifter was on its way to the McConnell Air Force base to retrieve Dreamliner nose sections made by nearby Spirit Aerosystems.   Dreamlifters are notably large because they are modified jumbo jets designed to haul pieces of Dreamliners between the different facilities that manufacture parts for aircraft.

So how does an airplane land at the wrong airport?  It’s not entirely clear yet how a mistake of this magnitude was made.  The Federal Aviation Administration is planning to investigate the incident to determine what happened and to see whether any regulations were violated.  What is known is that the airports have some similarities in layout that can be confusing from the air.  First off, there are three airports in fairly close proximity in the region.  The intended destination was the McConnell Air Force base, which has a runway configuration similar to Jabara airfield where the Dreamlifter landed by mistake.  Both runways run north-south and  are nearly parallel.  It can also be difficult to determine how long a runway is from the airport so the shorter length isn’t necessarily easy to see.  Beyond the airport similarities, the details of how the plane landed at the wrong airport haven’t been released yet.

What is known can be captured by building an initial Cause Map, a visual format for performing a root cause analysis.  One of the advantages of Cause Maps is they can be easily expanded to incorporate more information as it becomes available.  The first step in Cause Mapping is to fill in an Outline with the basic background information and to list how the issue impacts the overall goals.  There are a number of goals impacted in this example.  The potential for a plane crash means that there was an impact to both the safety and property goal because of the possibility of fatalities and damage to the jet.  The effort needed to ensure that the jet could safely take off on a shorter runway is an impact to the labor goal and the delay was an impact to the schedule goal.  The negative publicity surrounding this incident can also be considered an impact to the  customer service goal.

Once the Outline is completed, the Cause Map is built by asking “why” questions and intuitively laying out the answers until all the causes that contributed to the issue are documented.  Click on “Download PDF” above to see an Outline and initial Cause Map of this issue.

Good luck with any air travel planned for this busy holiday week.  And if your plane makes it to the right airport (even if it’s a little late), take a moment to be thankful because it’s apparently not the given I’ve generally assumed.

Can the Epidemic of Smartphone Thefts be Stopped?

By Kim Smiley

About 1.6 million handheld devices were stolen in the United States in 2012, the majority of which were smartphones.  In fact, the frequency at which the popular Apple devices are taken has given rise to a whole new term, “apple picking”.  Stolen smartphones cost consumers nearly $30 billion a year.  These thefts affect a significant number of smartphone owners with approximately 10 percent reporting that they have had a device stolen.

The problem of smartphone theft can be analyzed by building a Cause Map, a method for performing a visual root cause analysis.  A Cause Map is built by completing an Outline by both filling in the basic background information and listing how the issue impacts the overall goals.  The impacts to the goals from the Outline are then used as the first step in building the Cause Map.  Causes are then added by asking “why” questions to determine what other causes contributed to an issue.  (To view a high level Cause Map of this issue, click on “Download PDF” above.)

So why do so many smartphone get taken?  Smartphones are a popular target because it is lucrative to resell them, they are relatively easy to steal, and many of the crimes go unpunished.  Smartphones are fairly easy to steal because they are readily available since so many people carry them, and they are both small and light weight.  Many criminals who steal smartphones go unpunished because there are so many of them taken and it is difficult to locate the thieves.  Many stolen smartphones are shipped overseas which further complicates the situation.

The black market for smartphones is lucrative because the items are popular and relatively expensive to buy new.  People buy stolen smartphones because they are cheaper and they are able to be used by the “new owner”, especially overseas where the networks are different and phones deactivated in the US may be able to be used.

One of the possible solutions suggested to reduce the number of smartphone thefts is to include a kill switch in smartphone software.  This kill switch would essentially make the phone worthless because it would no longer function no matter where it was in the world.  If smartphones no longer have resale value, then there would be little incentive to steal them and the number of thefts should dramatically decrease.  While this idea is elegant in its simplicity, like most things there is more that needs to be considered.

The addition of a kill switch was recently rejected by cellphone carriers because of concerns about hacking and problems with reactivation.  If hackers found a way to flip the kill switches they would have the ability to destroy a huge number of smartphones from anywhere in the world.  Depending on how many users were targeted this could have a huge impact, which could be especially problematic for people who use their phones in an official capacity like law enforcement. It doesn’t take much imagination to see how this scenario could go horribly wrong. The proposed kill switch is also permanent so users won’t be able to reactivate their phones and any stolen phones that were recovered would be useless.  Companies continue to work on a number of ideas to make it more difficult to resell smartphones, but there isn’t general agreement on the best approach yet.  Only time will tell if the tide of smartphone thefts has peaked.

Pilot Response to Turbulence Leads to Crash

By ThinkReliability Staff

All 260 people onboard Flight 587, plus 5 on the ground, were killed when the plane crashed into a residential area on November 12, 2001.  Flight 587 took off shortly after another large aircraft.  The plane experienced turbulence.  According to the NTSB, the pilot’s overuse of the rudder mechanism, which had been redesigned and as a result was unusually sensitive, resulted in such high stress that that vertical stabilizer separated from the body of the plane.

This event is an example of an Aircraft Pilot Coupling (APC) event.  According to the National Research Council, “APC events are collaborations between the pilot and the aircraft in that they occur only when the pilot attempts to control what the aircraft does.  For this reason, pilot error is often listed as the cause of accidents and incidents that include an APC event.  However, the [NRC] committee believes that the most severe APC events attributed to pilot error are the result of the adverse APC that misleads the pilot into taking actions that contribute to the severity of the event.  In these situations, it is often possible, after the fact, to analyze the event carefully and identify a sequence of actions the pilot could have taken to overcome the aircraft design deficiencies and avoid the event.  However, it is typically not feasible for the pilot to identify and execute the required actions in real time.”

This crash is a case where it is tempting to chalk up the accident to pilot error and move on.  However, a more thorough investigation of causes identifies multiple issues that contributed to the accident and, most importantly, multiple opportunities to increase safety for future pilots and passengers.  The impacts to the goals, causes of these impacts, and possible solutions can be organized visually in cause-and-effect relationships by using a Cause Map.  To view the Outline and Cause Map, please click “Download PDF” above.

The wake turbulence that initially affected the flight was due to the small separation distance between the flight and a large plane that took off 2 minutes prior (the required separation distance by the FAA).  This led to a recommendation to re-evaluate the separation standards, especially for extremely large planes.  In the investigation, the NTSB found that the training provided to pilots on this particular type of aircraft was inadequate, especially because changes to the aircraft’s flight control system rendered the rudder control system extremely sensitive.  This combination is believed to be what led to the overuse of the rudder system, leading to stress on the vertical stabilizer that resulted in its detachment from the plane.  Specific formal training for pilots based on the flight control system for this particular plane was incorporated, as was evaluation of changes to the flight control system and requirements of handling evaluations when design changes are made to flight control systems for   previously certified aircraft. A caution box related to rudder sensitivity was incorporated on these planes, as was a detailed inspection to verify stabilizer to fuselage and rudder to stabilizer attachments.  An additional inspection was required for planes that experience extreme in-flight lateral loading events.  Lastly, the airplane upset recovery training aid was revised to assist pilots in recovering from upsets such as from this event.

Had this investigation been limited to a discussion of pilot error, revised training may have been developed, but it’s likely that a discussion of the causes that led to the other solutions that were recommended and/or implemented as a result of this accident would not have been incorporated.  It’s important to ensure that incident investigations address all the causes, so that as many solutions as possible can be considered.