Experts warn that vehicles are vulnerable to cyberattacks

By Kim Smiley 

By now, you have probably heard of the “internet of things” and the growing concern about the number of things potentially vulnerable to cyberattacks as more and more everyday objects are designed to connect to the internet.  According to a new report by the Government Accountability Office (GAO), cyberattacks on vehicles should be added to the list of potential cybersecurity concerns.  It’s easy to see how bad a situation could quickly become if a hacker was able to gain control of a vehicle, especially while it was being driven.

A Cause Map, a visual root cause analysis, can be built to analyze the issue of the potential for cyberattacks on vehicles.  The first step in the Cause Mapping process is to define the problem by filling out an Outline with basic background information as well as how the problem impacts the overall goals.  The Cause Map is then built by starting at one of the goals and asking “why” questions to visually lay out the cause-and-effect relationships. 

In this example, the safety goal would be impacted because of the potential for injuries and fatalities. Why is there this potential? There is the possibility of car crashes caused by cyberattack on cars. Continuing down this path, cyberattacks on cars could happen because most modern car designs include advanced electronics that connect to outside networks and these electronics could be hacked.  Additionally, most of the computer systems in a car are somehow connected so gaining access to one electronic system can give hackers a doorway to access other systems in the car.

Hackers can gain access to systems in the car via direct access to the vehicle (by plugging into the on-board diagnostic port or the CD player) or, a scenario that may be even more frightening, they may be able to gain access remotely through a wireless network.  Researchers have shown that it is possible to gain remote access to cars because many modern car designs connect to outside networks and cars in general have limited cybersecurity built into them. Why cars don’t have better cybersecurity built into them is a more difficult question to answer, but it appears that the potential need for better security hadn’t been identified.

As of right now, the concern over potential cyberattacks on cars is mostly a theoretical one.  There have been no reports about injuries caused by a car being attacked.  There have been cases of cars being hacked, such as at Texas Auto Center in 2010 when a disgruntled ex-employee caused cars to honk their horns at odd hours and disabled starters, but there are few (if any) reports of cyberattacks on moving vehicles.  However, the threat is concerning enough that government agencies are determining the best way to respond to it. The National Highway Traffic Safety Administration established a new division in 2012 to focus on vehicle electronics, which includes cybersecurity. Ideally, possible cyberattacks should be considered and appropriate cybersecurity should be included into designs as more and more complexity is added to the electronics in vehicles, and objects ranging from pace-makers to refrigerators are designed to connect to wireless networks.

Florida under attack by another invasive species

By Kim Smiley

Florida’s warm climate has made it an appealing home to many invasive species, such as Burmese pythons (see our previous blog) and giant African land snails.  Researchers fear another species, the Nile monitor lizard,  is also threatening native wildlife.  Nile monitor lizards are intimidating reptiles, growing up to 5 feet long, and they are not fussy about what they eat, consuming almost anything smaller than they are.  They will feed on mammals, birds, reptiles, amphibians, fish and eggs. There have even been reports of Nile monitor lizards making a meal out of pet cats.

This issue can be analyzed by building a Cause Map, a visual format for performing a root cause analysis.  A Cause Map visually lays out the cause-and-effect relationships that contribute to an issue so that they are easily understood.  The first step in building a Cause Map is to fill in an Outline to help define the problem.  Basic background information is recorded in the Outline in addition to how the problem impacts the overall goals.  To build a Cause Map, start at one of the impacted goals, start asking “why” questions and add the answers to the Cause Map. For this example, we will focus on the environmental goal.

Invasive Nile monitor lizards impact the environmental goal because they can have a negative impact on native wildlife.  Why? Monitor lizards eat a varied diet and there are permanent breeding populations of these lizards in Florida.  Why are there populations of Nile monitor lizards in Florida? They were introduced into the environment and the number of Nile monitor lizards in the wild quickly increased. (It’s a bit awkward to write out the “why” questions in this way, but click on “Download PDF” above to see how the Cause Map would visually lay out for this example.)

Nile monitor lizards are basically a perfect (or perfectly bad, depending on your point of view) invasive species.  They grow quickly and breed at an early age.  They lay many eggs at once, as many as 60 eggs in a single clutch. Their natural habitat is very similar to southern Florida and they have a tendency to wander over long distances so it isn’t surprising that they would quickly spread from where they were originally introduced into the wild.

Researchers don’t know exactly how Nile monitor lizards were first introduced into the wild, but it typically occurs when pets escape or are released.  Nile monitor lizards are sold as pets.  Often they are small when sold, but they quickly grow large and can be aggressive.  Owners may release their pets into the wild if they become tired of them or are unable to continue caring for the lizards.  It’s easy to see how a small pet lizard may seem like a good idea, but turn out to be a less than ideal roommate when they have grown into a large, active predatory adult lizard, complete with sharp claws and teeth.  Not to mention, the cost of feeding such a pet might be more than anticipated.

Researchers are still working on developing the best methods to control Nile monitor lizard populations in Florida.  (It is unlikely that Nile monitor lizards will ever be eradicated from Florida, but officials hope to control the numbers.)  Three permanent breeding populations of Nile monitor lizards have been identified, one of which is estimated to be hold over 1,000 lizards.

DNA testing has shown that there are actually two distinct species of Nile monitor lizards and all lizards tested in Florida have been determined to be the newly-named West African Nile monitor lizards. West African Nile monitor lizards aren’t likely to spread too far north in Florida and beyond because they aren’t adapted to cold weather.  The other species of Nile monitor lizards is native to a cooler part of Africa and could potentially spread to a wider area if ever introduced into the wild in the United States.

Bottom line: please don’t release any nonnative species anywhere (even goldfish – see our previous blog).  You may think you are doing the right thing for your pet, but invasive species can do massive damage to native wildlife.  Call a pet store or your local fish and wildlife service if you can no longer care for a pet.  You can also help by reporting sightings of nonnative species to your local fish and wildlife services.

Airplane Emergency Instructions: How do you make a work process clear?

By ThinkReliability Staff

What’s wrong with the process above?

This process provides instructions on how to remove the over-wing exit door on an airplane during an emergency.  However, imagine performing this process in an actual emergency.  During the time you spend opening the door, there will probably be people crowded behind you, frantic to get off the plane.  Step 4 indicates that after the door is detached from the plane wall, you should turn around and set the door (which is about 4’ by 2’ and can weigh more than 50 pounds) on the seats behind you.  In most cases, this will be impossible.  This is why emergency exit doors open towards the outside; in an emergency, a crush against the door will make opening the door IN impossible.

Even if it would be possible to place the door on the seat in the emergency exit row, it would likely reduce the safety of passengers attempting to exit.  As discussed, the exit door is fairly large and heavy.  It is likely to be displaced while passengers are exiting the airplane and may end up falling on a passenger, or blocking the exit path.

However, when this process was tested in training, it probably worked fine.  Why? Because it wasn’t an actual emergency, and there probably weren’t a plane full of passengers that really wanted to get out.  This is just another reason that procedures need to be tested in as close to actual situations as possible.  At the very least, any scenario under which the process is to be performed should be replicated as nearly as possible.

Now take a look at this procedure:

It’s slightly better, not telling us to put the removed door on the seat behind us, but instead it doesn’t tell us what to do with the door. Keep in mind that the person performing this procedure’s “training” likely consisted of a 30-second conversation with a flight attendant and that in all probability, the first time he or she will perform the task is during an emergency situation. When testing a procedure, it’s also helpful to have someone perform the procedure who is not familiar with it, with instructions to do only what the procedure says. In this case, that person would end up removing the door . . . and then potentially attempting to climb out of the exit with the door in their hands. This is also not a safe or efficient method of emergency escape.
This procedure provides a much better description of what should be done with the door. The picture clearly indicates that the door should be thrown out of the plane, where it is far less likely to block the exit or cause passenger injury.

The first two procedures were presumably clear to the person who created them.  But had they been tested by people with a variety of experience levels (particularly important in this case, because people of various experience levels may be required to open the doors in an emergency), the steps that really weren’t so clear may have been brought to light.

Reviewing procedures with a fresh eye (or asking someone to perform the procedure under safe conditions based only upon the written procedure) may help to identify steps that aren’t clear to everyone, even if they were to the writer.  This can improve both the safety, and the effectiveness, of any procedure used in your organization.

8 Injured by Arresting Cable Failure on Aircraft Carrier

By ThinkReliability Staff

An aircraft carrier is a pretty amazing thing. Essentially, it can launch planes from anywhere. But even though aircraft carriers are huge, they aren’t big enough for planes to take off or land in a normal method. The USS Dwight D. Eisenhower (CVN 69) has about 500′ for landing planes. In order for planes to be able to successfully land in that distance, it is equipped with an arresting wire system, which can stop a 54,000 lb. aircraft travelling 150 miles per hour in only two seconds and a 315′ landing area. This system consists of 4 arresting cables, which are made of wire rope coiled around hemp. These ropes are very thick and heavy and cause a significant risk to personnel safety if they are parted or detached.

This is what happened on March 18, 2016 while attempting to land an E-2C Hawkeye. An arresting cable came unhooked from the port side of the ship and struck a group of sailors on deck. At least 8 were injured, several of whom had to be airlifted off the ship for treatment. We will examine the details of this incident within a Cause Map, a visual form of root cause analysis.

The first step in any problem investigation is to define the problem. We capture the what, when, and where within a problem outline. Additionally, we capture the impacts to the goals. The injuries as well as the potential for death or even more serious injuries are impacts to the safety goal. Flight operations were shut down for two days, impacting both the mission and production/ schedule goal. The potential of the loss of or (serious damage to) the plane is an impact to the property goal. (In a testament to the skill of Navy pilots, the plane returned to Naval Station Norfolk without any crew injuries to the flight crew or significant damage to the plane.) The response and investigation are an impact to the labor goal. It’s also useful to capture the frequency of these types of incidents.   The Virginian-Pilot reports that there have been three arresting-gear related deaths and 12 major injuries since 1980.

The next step in the problem-solving process is to determine the cause-and-effect relationships that led to the impacted goals. Beginning with the safety goal, the injuries to the sailors resulted from being struck by an arresting cable. When a workplace injury results, it’s also important to capture the personal protective equipment (PPE) that may have impacted the magnitude of the injuries. In this case, all affected sailors were wearing appropriate PPE, including heavy-duty helmets, eye and ear protection. This is a cause of the injuries because had they NOT been wearing PPE, the injuries would have certainly been much more severe, or resulted in death.

The arresting cable struck the sailors because it came unhooked from the port side of the ship. The causes for the detachment of the cable have not been conclusively determined; however, a material failure results from a force on the material that is greater than the strength of the material. In this case the force on the arresting cable is from the landing plane. In this case, the pilot reported the plane “hit the cable all at once”, which could have provided more force than is typical. The strength of the cable and connection may have been impacted by age or use. However, arresting cables are designed to “catch” and slow planes at full power and are only used for a specific number of landings before being replaced.

Other impacted goals can be added to the Cause Map where appropriate (additional relationships may result). In this case, the potential damage to the plane resulted from the landing failure, which was caused by the detachment of the arresting cable AND because the arresting cable is needed to safely land a plane on an aircraft carrier.

The last step of the Cause Mapping process is to determine solutions to reduce the risk of the incident recurring. More investigation is needed to ensure that the cable and connection were correctly installed and maintained. If it is determined that there were issues with the connection and cable, the processes that lead to the errors will be improved. However, it is determined that the cable and connection met design criteria and the detachment resulted from the plane landing at an unusual angle, there may be no changes as a result of this investigation.

It seems unusual that an investigation that resulted in 8 injuries would result in no action items. However, solutions are based on achieving an appropriate level of risk. The acceptable level of risk in the military is necessarily higher than it is in most civilian workplaces in order to achieve desired missions. Returning to the frequency from the outline, these types of incidents are extremely rare. The US Navy currently has ten operational aircraft carrier (and an eleventh is on the way). These carriers launch thousands of planes each year yet over the last 36 years, there have been only 3 deaths and twelve major injuries associated with landing gear failures, performing a dangerous task in a dangerous environment. Additionally, in this case, PPE was successful in ensuring that all sailors survived and limiting injury to them.

To view the outline and Cause Map of this event, click on “Download PDF” above.

 

The Force Was NOT With Them!

By Jon Bernardi

A long time ago, in a galaxy far, far away, the Empire tried to use their fancy Death Star to keep the member systems in line. This plan did not work out very well, as Death Star One (DS-1) was not able to fulfill its mission of empowering galactic domination! DS-1 had travelled across the galaxy to quell the rebellion at the rebel base on Yavin 4, but did not count on the über-Force of the Rebel Alliance. The Empire did not realize the power of the good side of the Force as the rebels overcame all odds and were able to destroy DS-1. We can do an analysis of the incident to determine the system of causes for the destruction and show those causes visually in a Cause Map.

As much as the Emperor and his minions would not like to see this published, we begin by looking at how the Empire’s goals were impacted. We start by developing an outline of the incident. You might suspect that different factions within the Empire see this problem differently! Some don’t believe there is such a thing as “The Force” and place their faith in the power of the machine. Others use the Dark Side to exploit the mortal weaknesses of the players. The goals of the Empire are impacted in a number of ways: DS-1 is ultimately destroyed, with loss of life, and loss of a dominant-style weapon. The Rebel Alliance has gained a toe-hold against the Empire! We use the impact to the goals as the first effects of our cause-and-effect relationships and will use the disparate view of “the problem” to help us with the branches of the Cause Map.

We already know that DS-1 had planet-busting capabilities, as demonstrated convincingly at Alderaan, Princess Leia’s adopted planet. This may have led the Empire’s power structure to doubt the “Power of the Force” and put their trust in a technological titan, “The ultimate power in the universe!” Even after the plans for the station had been obtained by the Rebellion, the commander of DS-1 still disregarded any concern of vulnerability in his unsinkable marvel. In a remarkable display of hubris, the Empire allows the small band of rebels aboard the Millennium Falcon to escape with the stolen plans for DS-1. The Empire intends to follow them, find the rebel base, and wipe out the rebellion once and for all!

Another branch of the Cause Map follows the path of the stolen plans and the re-awakening of the Force on the planet Tatooine. As we analyze this section of the map, we can see the convergence of causes that led to the technical experts of the Rebel Alliance finally obtaining the plans for DS-1, analyzing them and discovering the dreaded “thermal exhaust port” – (guess even a DS has to have a tailpipe!).

Even a long time ago, we see causes in multiple areas coming together to form the overall picture of the incident. The plucky Rebellion, had THE FORCE with them!

Oil Leaked from shipwreck near Newfoundland

By Kim Smiley

On March 31, 2013, oil was reported in Notre Dame Bay, Newfoundland.  Officials traced the source of the oil back to a ship, the Manolis L, that sank in 1985 after running aground.  The Manolis L is estimated to have contained up to 462 tons of fuel and 60 tons of diesel when it sank and much of that oil is believed to still be contained within the vessel.  Officials are working to ensure the oil remains contained, but residents of nearby communities who rely on tourism and fishing are concerned about the potential for more oil to be released into the environment.

A Cause Map, a visual format for performing root cause analysis, can be built to better understand this issue.  There are three steps in the Cause Mapping process. The first step is to fill out an Outline with the basic background information along with listing how the problem impacts the goals.  There is also space on the Outline to note the frequency of the issue.  For this example, 2013 was the first time oil was reported to be leaking from this particular sunken ship, but there have been 700 at-risk sunken vessels identified in Canadian waters alone.  It’s worth noting this fact because the amount of resources a group is willing to use to address a problem may well depend on how often it is expected to occur.  One leaking sunken ship is a different problem than potentially having hundreds that may require action.

The second step is to perform the analysis by building the Cause Map.  A Cause Map is built by asking “why” questions and laying out the answers to visually show the cause-and-effect relationships.  Once the causes have been identified, the final step is to develop and implement solutions to reduce the risk of similar problems occurring in the future.  Click on “Download PDF” to view an Outline and intermediate level Cause Map for this problem.

In this case, the environmental goal is clearly impacted because oil was released into the environment.  Why? Oil leaked out of a sunken ship because a ship had sunk that contained a large quantity of oil and there were cracks in the hull.  The hull of this particular ship is thin by modern standards (only a half-inch) and it has been sitting in sea water for the last 30 years.  A large storm hit the region right before oil was first reported and it is believed that the hull (already potentially weakened by corrosion) was damaged during the storm.  The Coast Guard identified two large cracks in the ship that were leaking oil during their investigation.

Once the causes of the issue have been identified, the final step is to implement solutions to reduce the risk of future problem.  This is where a lot of investigations get tricky.  It is often easier to identify the problem than to actually solve it. It can be difficult to determine what level of risk is acceptable and how many resources should be allotted to an issue.  The cracks in the hull of the Manolis L have been patched using weighted neoprene sealants and a cofferdam has been installed to catch any oil that leaks out.  The vessel is being monitored by the Canadian Coast Guard via regular site visits and aerial surveillance flights. But the oil remains in the vessel so there is the potential that it could be released into the environment.

Many local residents are fighting for the oil to be removed from the sunken ship, rather than just contained, to further reduce the risk of oil being released into the environment. But removing oil from a sunken ship is very expensive.  In 2013, it cost the Canadian Coast Guard about $50 million to remove oil from a sunken ship off the coast of British Columbia. So far, officials feel that the measures in place are adequate and that the risk doesn’t justify the cost of removing the oil from the vessel. If they are right, the oil will stay safely contained at a fraction of the cost of removing it, but if they are wrong there could be lasting damage to local communities and wildlife.

In situations like this, there are no easy answers.  Anybody who works to reduce risk faces similar tradeoffs and generally the best you can do is to understand a problem as thoroughly as possible to make an informed decision about the best use of resources.

Worker dies while manually measuring tank

By Kim Smiley

The potential danger of confined spaces is well documented, but nine fatalities have shown that people working near open hydrocarbon storage hatches can also be exposed to dangerous levels of hydrocarbon gases and oxygen-deficient atmospheres.  NPR recently highlighted this issue in an article entitled “Mysterious Death Reveals Risk In Federal Oil Field Rules” that discussed the death of Dustin Bergsing.  His job duties included opening the hatch on a crude oil storage tank to measure the level of the oil and was found dead next to an open hatch.  He was healthy and only 21 years old.

A Cause Map, a visual format for performing a root cause analysis, can be used to help explain what happened to cause his death.  A Cause Map intuitively lays out the cause-and-effect relationships that contributed to an issue and is built by asking “why” questions.  Click on “Download PDF” to view a high level Cause Map of this accident.

So why did his death occur?  An autopsy showed that his death occurred because he had hydrocarbons in his blood.  This occurred because he was exposed to hydrocarbon vapor and he remained in the dangerous environment. (When two causes both contribute to an effect, they are listed vertically on the Cause Map and separated by an “and”.)

When a person is exposed to hydrocarbon vapor, they get disoriented before passing out so it is very difficult for them to get to safety on their own.  Bergsing was working alone at the time of his death and no one was aware that he was in trouble before it was too late.

He was exposed to hydrocarbon gases because he opened a hatch on a crude oil storage tank and the gas had collected at the top of the tank.  He opened the hatch because he planned to manually measure the tank level by dropping a rope inside. Manual tank measurement is a common method to determine level in crude oil storage tanks. Crude oil contains volatile hydrocarbons that can bubble out of the crude oil and collect at the top; the gas will rush out of the tank if a hatch is opened.

Additionally, he wasn’t wearing adequate PPE equipment because it wasn’t required by any regulations and there was limited awareness of this danger.

After his and the other deaths, the industry is starting to become more aware of this issue.  The National Institute for Occupational Safety and Health (NIOSH) and the Occupational Safety and Health Administration (OSHA) issued a hazard alert bulletin that identified health and safety risks to workers who manually gauge or sample fluids on production and flowback tanks from exposure to hydrocarbon gases and vapors and exposure to oxygen-deficient atmospheres. In addition to working to raise awareness of the issue, OSHA and NIOSH made recommendations to improve working safety that include the following:

– Implementing alternate procedures that allow workers to monitor tank levels and sample without opening hatches

– Installing hatch pressure indicators

– Conducting worker exposure assessments

– Providing training on the hazard and posting hazard signage

– Not permitting employees to work alone

Please read the OSHA and NIOSH hazard alert bulletin for more information and a full list of the recommendations. Many of the recommendations would be expensive and time-consuming to implement, but some may be relatively simple ways to reduce risk. Continuing to provide information to workers about the potential hazards might be a good first step to improve their safety.

Track Workers Killed by Train

By ThinkReliability Staff

A derailment and the fatalities of two railroad workers on April 3, 2016 has led to an investigation by the National Transportation Safety Board (NTSB). In this investigation, the NTSB will address the impacts of the accident, determine what caused the accident and will provide recommendations to prevent similar accidents from recurring. While the investigation is still underway, a wealth of information related to the accident is already available to begin the analysis. We will look at what is currently known regarding the accident in a Cause Map, a visual form of root cause analysis.

The first step of the analysis is to define the problem. This includes the what, when, and where of the incident, as well as the impacts to the organizational goals. Capturing the impacts to the goals is particularly important because the recommendations that will result from the analysis aim to reduce these impacts. If we define the problem as simply a “derailment”, recommendations may be limited to those that prevent future derailments. Not only are we looking for recommendations to prevent future derailments, we are looking for recommendations to prevent all the impacted goals. In this case, that includes worker safety: 2 workers died, public safety: 37 passengers were injured, customer service: the train derailed, property: the train and some construction equipment was damaged, and labor: response and investigation are required.

The analysis is performed by beginning with the impacted goals and developing the cause-and-effect relationships that led to those impacts. Asking “why” questions can help to identify some of the cause-and-effect relationships, but there may be more than one cause that results in an effect. In this case, the worker fatalities occurred because the train struck heavy equipment and the workers were in/on/near the equipment. Both of these causes had to occur for the effect to result. The workers were on the equipment performing routine maintenance. In addition, their watch was ineffective. When capturing causes, it’s important to also include evidence, which validates the cause.

We know the watch was ineffective, because federal regulation requires a watch for incoming trains that gives at least a fifteen second warning. Fifteen seconds should have been sufficient time for the workers to exit the equipment. Because this did not happen, it follows that the watch was ineffective.

The train struck the heavy equipment because the equipment was on track 3, the train was on track 3, and the train was unable to brake in time. It’s unclear why the heavy equipment was on the track; rail safety experts say heavy equipment should never be directly on the track. The train was on track 3 because it was allowed on the track. Work crews are permitted to shut off the current to preclude passage of trains into the work zone, but they did not in this case, for reasons that are still being investigated. Additionally, the dispatcher allowed the train onto the track. Per federal regulations, when workers are on the track, train dispatchers may not allow trains on track until roadway worker gives permission. It appears that in this case the workers either failed to secure permission to work on the track (thus notifying the dispatcher of their presence) or the work notification was improperly cancelled, allowing trains to return to the track, possibly due to a miscommunication between the night and day crews. This is also still under investigation.

While inspection of the cars and maintenance records found no anomalies, the braking system is under investigation to determine whether or not it affected the train’s ability to brake. Also under investigation is the Positive Train Control (PTC), which should have emitted warnings and slowed the train automatically. However, the supplemental shunting device, which alerts the signaling system that the track is occupied, and is required by Amtrak rules, was not in place. Whether this was sufficient to prevent the PTC from stopping the train in time is also under investigation. The conductor placed the train in emergency mode 5 seconds before the collision. As the train was traveling at 106 mph (the speed limit was 110 mph in the area), this did not give adequate time to brake. There should have been a flagman to notify the train that a crew was on the track, but was not. The flagman also carries an air horn, which provides another notification to the track crew that a train is coming.

Says Ashley Halsey III, reporting in The Washington Post, “Basic rules of railroading and federal regulations should have prevented the Amtrak derailment near Philadelphia on Sunday that killed two maintenance workers.” It appears that multiple procedural requirements were not followed, but more thorough investigation is required to determine why and what can be done in the future to improve safety by preventing derailments and worker fatalities.

To view the available information in a Cause Map, please click “Download PDF” above.

Don’t Just Google It . . . Maps Error Leads to Wrong House Being Demolished

By ThinkReliability Staff

Imagine coming “home” and finding an empty lot. That’s what happened in Rowlett, Texas on March 22, 2016. A tornado had previously damaged many of the homes in the area; some were slated for repairs, and some for demolition. The demolition company had plans to level the duplex at 7601 Cousteau Drive, but instead demolished the duplex at 7601 Calypso Drive.

An error on Google Maps has been blamed for the mistake but, as is typical with these types of incidents, there’s more to it than that. To ensure that all the causes leading to an incident are identified and addressed, it’s important to methodically analyze the issue. Creating a Cause Map, a form of root cause analysis that creates a map of cause-and-effect relationships is one way a problem can be analyzed.

The first step in the Cause Mapping process is to capture the what, when and where of an incident. Along with the geographic (where the incident occurred) and process location (what was being done at the time), it can be helpful to capture any differences about the situation surrounding the incident. In this case, “differences” would be anything out of the ordinary during the demolishing of the house at 7601 Cousteau/Calypso. The error on Google Maps (which pointed to the house which was mistakenly demolished) is one difference. Another difference is that the name of the street was not checked during the location confirmation. Other potential differences between this demolish job and other demolish jobs were that the same house number was present on both streets, in close proximity, and both houses experienced tornado damage. These differences may or may not be causally related – at this point, potential differences are just captured.

The next step is to capture the impacts to the organization’s goals as a result of the incident. These impacts to the goals become the first effects in the cause-and-effect relationships. In this case, there’s a potential for injuries (an impact to the safety goal) as a result of an unexpected demolition. The demolition of a house planned to be repaired is an impact to the environmental, customer service, and property goals. The demolition of the wrong house is an impact to the production/ schedule and labor/time goals.

The analysis begins with one of the impacted goals. Asking “why” questions develops cause-and-effect relationships. For example, the demolition of the wrong house was caused by the duplex at 7601 Calypso Drive being demolished while the duplex at 7601 Cousteau was planned for demolition. Because both of these facts (which can be verified with evidence) resulted in the wrong house being demolished, they are both connected to the cause of ‘demolition of wrong house” and joined with an “AND”.

Each cause on the map is also an effect. More detail can be added to the Cause Map by continuing to ask “why” questions. However, one cause may not be sufficient to result in an effect, so questions such as “what else was required?” are also necessary to ensure all causes are present on the map. In this case, the crew went to the wrong house because of an error on Google Maps, which was used to find the house. Per a Google spokeswoman, 7601 Cousteau was shown at the location of 7601 Calypso. This error has been identified as “the cause” of the incident. However, there were other opportunities to catch the error. Opportunities that were missed are also causes in the cause-and-effect relationship. While there was a site confirmation prior to demolition, only the street number (7601), lot location (corner lot), and tornado damage were confirmed. All three of these data points used to confirm the location were the same for 7601 Cousteau and 7601 Calypso.

What hasn’t been mentioned in the news but is apparent from looking at a (corrected) Google Map is that the house-numbering scheme of the neighborhood was set up for failure. 7601 Calypso is on the corner of Calypso Drive and Cousteau Drive, meaning a person could easily believe it was 7601 Cousteau. 7601 Cousteau is just a block away, on the corner of Cousteau Drive and an apparently unnamed alley. I can’t imagine it is the first time that someone has confused the two.

While it’s too late for 7601 Calypso Drive, Google Maps has fixed the error. Likely in the future this demolition company will use another identifier (or will mark the house while talking to the homeowners prior to the demolition) to ensure that the wrong house is not destroyed.

To view the Cause Map, as well as the updated Google Map, click on “download PDF” above.

DC Metro shut down for entire day after fire for inspections

By Kim Smiley 

A fire in a DC Metro tunnel early on March 14, 2016 caused delays on three subway lines and significant disruption to both the morning and evening commutes.  There were no injuries, but the similarities between this incident and the deadly smoke incident on January 12, 2015 (see our previous blog on this incident) led officials to order a 24-hour shutdown of the entire Metro system for inspections and repairs.

The investigation into the Metro fire is still ongoing, but the information that is known can be used to build an initial Cause Map.  A Cause Map is built by asking “why” questions and visually laying out all the causes that contributed to an incident.  Cause Mapping an issue can identify areas where it may be useful to dig into more detail to fully understand a problem and can help develop effective solutions.

So why was there a fire in the Metro tunnel?  Investigators have not released details about the exact cause, but have stated that the fire was caused by issues with a jumper cable.  Jumper cables are used in the Metro system to bridge gaps in the third rail, essentially functioning as extension cords.  The Metro system uses gaps in the third rail to create safer entry and exit spaces for both workers and passengers because of the potential danger of contact with the electrified third rail.  The third rail carries 750 volts of electricity used to power Metro trains and could cause serious injury or even death if accidently touched.

The jumper cables also carry high voltage and fires and/or smoke can occur if one malfunctions.  Investigators have not confirmed the exact issue that lead to this fire, but insulation failures have been identified in other locations and is a possible cause of the fire. (Possible causes can be added to the Cause Map with a “?” to indicate that more evidence is needed.)

One of the things that is always important to consider when investigating an incident is the frequency of occurrence of similar issues.  The scope of the investigation and possible solutions considered will likely be different if it was the 20th time an incident has occurred rather than the first. In this case, the fire was similar to another incident in January 2015 that caused a passenger death.  Having a second incident occur so soon after the first naturally raised questions about whether there were more unidentified issues with jumper cables.  The Metro system uses approximately 600 jumper cables and all were inspected during the day-long shutdown. Twenty-six issues were identified and repaired. Three locations had damage severe enough that Metro would have immediately stopped running trains through them if the extent of the damage had been known.

The General Manger of the DC Metro system, Paul J. Wiedefeld, is relatively new to his position and has been both praised and criticized for the shutdown.  Trying to implement solutions and reduce risk is always a balancing act between costs and benefits.  Was the cost of a full-day shutdown and inspections of all jumper cables worth the benefit of knowing that the cable jumpers have all been inspected and repaired?  At the end of the day, it’s a judgement call, but I personally would be more comfortable riding the Metro with my children now.