Category Archives: Root Cause Analysis – Incident Investigation

Risks of Future Landslides – and Actual Past Landslides – Ignored

By ThinkReliability Staff

Risk is determined by both the probability of a given issue occurring, and the consequence (impact) if it does. In the case of the mudslide that struck Oso, Washington on March 22, 2014, both the probability and consequence were unacceptably high.

The probability of a landslide happening in the area had not only been well-documented in reports as far back as 1951, the same area where dozens were killed on March 22 had experienced 5 prior landslides since 1949. The consequences of these prior landslides were less than the 2014 landslide because of the severity of the landslide, and because increased residential development meant more people were in harm’s way.

While the search for victims is still ongoing, the causes and impacts of the landslide are mostly known. This incident can be analyzed using a Cause Map, or visual root cause analysis, to show the cause-and-effect relationships that led to the tragic landslide.

First, we capture the background information and the impact to the goals in the problem outline, thereby defining the problem. The landslide (actually a reactivation of an existing landslide, according to Professor Dave Petley, in his blog) occurred around 10:40 a.m. on March 22, 2014 in an Oso, Washington residential area. As previously noted, there had been prior landslides in the area, and there were outdated boundaries used for logging permissions (which we’ll talk more about later). The safety goal was impacted due to the 30 known deaths, 15 and people missing. (Not all of the 27 have been identified, so the known dead and missing numbers may overlap. However, at this point, there is little hope that any additional rescues will take place.) The environmental goal was impacted due to the landslide and the customer service goal (insofar as the residents can be considered customers of their local area) was impacted due to the displacement of 30 families. Logging in an area that should have been protected impacts the regulatory goal. The estimated losses (of residences and belongings) are approximately $10 million, impacting the property goal and the massive search and a recovery effort impacts the labor goal.

Beginning with these impacted goals, asking ‘why” questions allows us to develop cause-and-effect relationships showing how the incident occurred. The safety goal was impacted because of the deaths and missing, which resulted from people being overcome by a landslide. In order for this to occur, the landslide had to occur, and the people had to be in the vicinity of the landslide.

As is known from history (see the timeline on the downloadable PDF), this area is prone to landslides. Previous reports identified the erosion of the area due to the proximity of the river as a cause of these landslides. An additional cause is water seepage in the area. Water seepage is increased when the water table rises from overly wet weather (as is typically found at the end of winter). Trees can help reduce water seepage by absorbing the water. When trees are removed, water seepage in an area can increase significantly. Because of this, removal of trees (for logging or other purposes) is generally restricted near areas prone to landslides. However, for reasons yet unknown, logging was permitted in what should have been a restricted area, because the maps used to allow it were outdated. Says the geologist who developed the new maps, “I suspect it just got lost in the shuffle somewhere.” Additionally, analysis by the Seattle Times, the logging went into the “old” restricted area as well. The State Forester is investigating the allegations and whether the logging played a role in the landslide.

Regardless of the magnitude of the impact of the logging and weather, the area was prone to landslides. Yet it was allowed to be developed, despite multiple reports warning of danger and five previous landslides. In fact, construction in the area resumed just three days after the last landslide in 2006. The 2006 landslide also interrupted a plan to divert the river farther from the landslide area. Despite all of this, the area built up (with houses built as recently as 2009) and those residents were allowed to stay. (While buying out the residents was under consideration, it was apparently dismissed because the residents did not want to move.) While officials in the area maintain that they thought it was safe, a long history of reports and landslides suggest otherwise.

If a lack of knowledge of the risk of the area continues to be a concern, aerial scanning with advanced technology (lidar) could help. Use of lidar in nearby Seattle identified four times the number of landslide zones that were spotted with aerial surveying, which is more typically used.

To view a summary of the investigation, including a timeline, problem outline and Cause Map, please click “Download PDF” above.

When You Call Yourself ThinkReliability…

By ThinkReliability Staff

While I was bombasting about the Valdez oil spill in 1989, one of those ubiquitous internet fairies decided that I did not really need the network connection at my remote office.  Sadly this meant that the attendees on my Webinar had to listen only to me speaking without seeing the pretty diagrams I made for the occasion (after a short delay to switch audio mode).

Though I have all sorts of redundancies built in to Webinar presentations (seriously, I use a checklist every time), I have not prepared for the complete loss of network access, which is what happened during my March 20th, 2014 Webinar.  I’m not going to use the term “root cause”, because I still had another plan . . . (yep, that failed, too).

For our mutual amusement (and because I get asked for this all the time), here is a Cause Map, or visual root cause analysis – the very method I was demonstrating during the failure – of what happened.

First we start with the what, when and where.  No who because blame isn’t the point, though in this case I will provide full disclosure and clarify that I am, in fact, writing about myself.  The Webinar in question was presented on March 20, 2014 at 2:00 PM EST (although to my great relief the issues didn’t start until around 2:30 pm).  That little thorn in my side? It was the loss of a network connection at the Wisconsin remote office (where I typically present from).  I was using Citrix Online’s GoToWebinar© program to present a root cause analysis case study of the Valdez oil spill online.

Next we capture the impact to the organization’s (in this case, ThinkReliability) goals.  Luckily, in the grand scheme of things, the impacted goals were pretty minor.  I annoyed a bunch of customers who didn’t get to see my slides and I scheduled an additional Webinar.  Also I spent some time doing follow-up to those who were impacted, scheduling another Webinar, and writing this blog.

Next we start with the impacted goals and ask “Why” questions.  The customer service goal was impacted because of the interruption in the Webinar.  GoToWebinar© (as well as other online meeting programs) has two parts: audio and visual.  I temporarily lost audio as I was using the online option (VOIP), which I use as a default because I like my USB headset better than my wireless headset.  The other option is to dial in using the phone.  As soon as I figured out I had lost audio, I switched to phone and was able to maintain the audio connection until the end of the Webinar (and after, for those lucky enough to hear me venting my frustration at my office assistant).

In addition to losing audio, I lost the visual screen-sharing portion of the Webinar.   Unlike audio, there’s only one option for this.  Screen sharing occurs through an online connection to GoToWebinar©.  Loss of that connection means there’s a problem with the GoToWebinar© program, or my network connection.  (I’ve had really good luck with GoToWebinar; over the last 5 years I have used the program at least weekly with only two connection problems attributed to Citrix.)  At this point I started running through my troubleshooting checklist.  I was able to reconnect to audio, so it seemed the problem was not with GoToWebinar©.  I immediately changed from my wired router connection to wireless, which didn’t help.  Meanwhile my office assistant checked the router and determined that the router was not connected to the network.

You will quickly see that at this point I reached the end of my expertise.  I had my assistant restart the router, which didn’t work, at least not immediately.  At this point, my short-term connection attempts (“immediate solutions”) were over.  Router troubleshooting (beyond the restart) or a call to my internet provider were going to take far longer than I had on the Webinar.

Normally there would have been one other possibility to save the day.  For online presentations, I typically have other staff members online to assist with questions and connection issues, who have access to the slides I’m presenting.  That presenter (and we have done this before) could take over the screen sharing while I continued the audio presentation.  However, the main office in Houston was unusually short-staffed last week (which is to say most everyone was out visiting cool companies in exciting places).  And (yes, this was the wound that this issue rubbed salt in), I had been out sick until just prior to the Webinar.  I didn’t do my usual coordination of ensuring I had someone online as my backup.

Because my careful plans failed me so completely, I scheduled another Webinar on the same topic.  (Click the graphic below to register.)  I’ll have another staff member (at another location) ready online to take over the presentation should I experience another catastrophic failure (or a power outage, which did not occur last week but would also result in complete network loss to my location).   Also, as was suggested by an affected attendee, I’ll send out the slides ahead of time.  That way, even if this exact series of unfortunate events should recur, at least everyone can look at the slides while I keep talking.

To view my comprehensive analysis of a presentation that didn’t quite go as planned, please click “Download PDF above.  To view one of our presentations that will be “protected” by my new redundancy plans, please see our upcoming Webinar schedule.

Microsoft Withdrawing Support for Windows XP, Still Used by 95% of World’s 2.2 Million ATMs

By ThinkReliability Staff

On April 8, 2014, Microsoft will withdraw support for its XP operating system.  While this isn’t new news (Microsoft made the announcement in 2007), it’s quickly becoming an issue for the world’s automated teller machines (ATMs).  Of the 2.2 million ATMs in the world, 95% run Windows XP.  Of these, only about a third will be upgraded by the April 8th deadline.

These banks then face a choice: upgrade to a newer operating system (which will have to be done eventually anyway), pay for extended support, or go it alone.  We can look at the potential consequences for each decision – and the reasons behind the choices – in a Cause Map, a visual form of root cause analysis.

First we look at the consequences, or the impacts to the goals.  The customer service goal is impacted by the potential exposure to security threats.  (According to Microsoft, it’s more than just potential.  Says Timothy Rains, Microsoft’s Director of trustworthy computing, “The probability of attackers using security updates for Windows 7, Windows 8, Windows Vista to attack Windows XP is about 100 per cent.”)  Required upgrades, estimated to cost each bank in the United Kingdom $100M (US) by security experts, impact the production/schedule and property/equipment goals.   Lastly, if implemented, extended service/ support contracts will impact the labor/time goal.  Though many banks have announced they will extend their contract, the costs of such an extension are unclear, and likely vary due to particular circumstances.

As mentioned above, banks have a choice.  They can upgrade immediately, as will be required at some point anyways.  However, it’s estimated that most (about two-thirds) of banks worldwide won’t make the deadline.  They will then continue to operate in XP, with or without an extended service/ support contract.

Operating without an extended contract will create a high vulnerability to security risks – hackers and viruses.  It has been surmised that hackers will take security upgrades developed for other operating systems and reverse engineer them to find weaknesses in XP.  The downside of the extended contracts is the cost.

Given the risk of security issues with maintaining XP as an operating system, why haven’t more banks upgraded in the 7 years since Microsoft announced it would be withdrawing support?  There are multiple reasons.  First, because of the huge number of banks that still need to upgrade, experts available to assist with the upgrade are limited.  Many banks use proprietary software based on the operating system, so it’s not just the operating system that would need to be upgraded – so would many additional programs.

The many changes that banks have been dealing with as a result of the financial crisis may have also contributed to the delay.  (For more on the financial crisis, see our example page.)  Banks are having trouble implementing the many changes within the time periods specified.  Another potential cause is that banks may be trying to perform many upgrades together.  For example, some ATMs will move to a new operating system and begin accepting chip cards as part of the same upgrade.  (For more about the move towards chip cards, see our previous blog.)

Some banks are just concerned about such a substantial change.  “I ask these companies why they are using old software, they say ‘Come on, it works and we don’t want to touch that,'” says Jaime Blasco, a malware researcher for AlienVault.  The problem is, soon it won’t be working.

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more.

Cleaning up Fukushima Daiichi

By ThinkReliability Staff

The nuclear power plants at Fukushima Daiichi were damaged beyond repair during the earthquake and subsequent tsunami on March 11, 2011.  (Read more about the issues that resulted in the damage in our previous blog.)  Release of radioactivity as a result of these issues is ongoing and will end only after the plants have been decommissioned.  Decommissioning the nuclear power plants at Fukushima Daiichi will be a difficult and time consuming process.  Not only the process but the equipment being used are essentially being developed on the fly for this particular purpose.

Past nuclear incidents offer no help.  The reactor at Chernobyl which exploded was entombed in concrete, not dismantled as is the plan for the reactors at Fukushima Daiichi.  The reactor at Three Mile Island which overheated was defueled, but the pressure vessel and buildings in that case were not damaged, meaning the cleanup was of an entirely different magnitude.  Lake Barrett, the site director during the decommissioning process at Three Mile Island and a consultant on the Fukushima Daiichi cleanup, says that nothing like Fukushima has ever happened before.

An additional challenge?  Though the reactors have been shut down since March 2011, the radiation levels remain too high for human access (and will be for some time).  All access, including for inspection, has to be done by robot.

The decommissioning process involves 5 basic steps (though the completion of them will take decades).

First, an inspection of the site must be completed using robots.  These inspection robots aren’t your run-of-the-mill Roombas.  Because of the steel and concrete structures involved with nuclear power, wireless communication is difficult.  One type of robot used to survey got stuck in reactor 2 after its cable was entangled and damaged.   The next generation of survey robots unspools cable, takes up slack when it changes direction and plugs itself in for a recharge.  This last one is particularly important: not only can humans not access the reactor building, they can’t handle the robots after they’ve been in there.  The new robots should be able to perform about 100 missions before component failure, pretty impressive for access in a site where the hourly radiation dose can be the same as a cleanup worker’s annual limit (54 millisieverts an hour).

Second, internal surfaces will be decontaminated.  This requires even more robots, with different specialties.  One type of robot will clear a path for another type, which will be outfitted with water and dry ice, to be blasted at surfaces in order to remove the outer level, and the radiation with it.  The robots will them vacuum up and remove the radioactive sludge from the building.  The resulting sludge will have to be stored, though the plan for the storage is not yet clear.

Third, spent fuel rods will be removed, further reducing the radiation within the buildings.  A shielded cask is lowered with a crane-like machine, which then packs the fuel assemblies into the cask.  The cask is then removed and transported to a common pool for storage.  (The fuel assemblies must remain in water due to the decay heat still being produced.)

Fourth, radioactive water must be contained.  An ongoing issue with the Fukushima Daiichi reactors is the flow of groundwater through contaminated buildings.  (Read more about the issues with water contamination in a previous blog.)  First, the flow of groundwater must be stopped.  The current plan is to freeze soil to create a wall of ice and put in a series of pumps to reroute the water.    Then, the leaks in the pressure vessels must be found and fixed.  If the leaks can’t be fixed, the entire system may be blocked off with concrete.

Another challenge is what to do with the radioactive water being collected.  So far, over 1,000 tanks have been installed.  But these tanks have had problems with leaks.    Public sentiment is against releasing the water into the ocean, though the contamination is low and of a form that poses a “negligible threat”.  The alternative would be using evaporation to dispose of the water over years, as was done after Three Mile Island.

Finally, the remaining damaged nuclear material must be removed.  More mapping is required, to determine the location of the melted fuel.  This fuel must then be broken up using long drills capable of withstanding the radiation that will still be present.  The debris will then be taken into more shielded casks to a storage facility, the location of which is yet to be determined.  The operator of the plant estimates this process will take at least 20 years.

To view the Process Map laid out visually, please click “Download PDF” above.  Or click here to read more.

Dangerous Combination: Propane Shortages and a Bitterly Cold Winter

By Kim Smiley

Propane shortages and skyrocketing prices in parts of the United States have made it difficult for some homeowners to affordably and consistently heat their homes this winter.   The brutally cold winter many regions are experiencing is also worsening both the causes and effects of the shortages.

A Cause Map can be built to help understand this issue.  Cause Maps are a visual format for performing a root cause analysis that intuitively lay out the causes that contributed to an issue to show the cause-and-effect relationships.  To view a high level Cause Map of this issue, click on “Download PDF” above.

Why have there been recent propane shortages in regions of the United States?  This question is particularly interesting given the fact that propane production in the United States has increased 15 percent in the past year.   One of the reasons that propane prices have dramatically increased is because of a spike in demand.  There was a larger than normal grain crop this fall, which was also wetter than usual.  Wet grains must be dried prior to storing to prevent spoiling and propane is used in the process.  Local propane supplies were depleted in some areas because five times more propane was used to dry crops this year than last.   About 5 percent of homes in the United States depend on propane for heat and the unusually frigid temperatures this winter have resulted in additional increases in propane demand.

In addition to the increase in demand, there have been issues replenishing local supplies of propane quickly enough to support the increased demand.  There have been some logistical problems transporting propane this winter.  The Cochin pipeline was out of service for repairs, limiting how quickly propane could be transported to areas experiencing shortages.  There were rail rerouting issues that impacted shipments from Canada.

Additionally, many are asking questions about what role propane exports have played into the domestic shortages.   Propane exports have quadrupled in the last 3 years.  New mining techniques and improved infrastructure have made exporting propane to foreign markets more lucrative and companies have begun to ship more propane overseas. As more propane is shipped to foreign markets, there is less available for use in the United States.

The propane shortages are an excellent example of supply and demand in action.  Increasing demand combined with decreasing supply will result in higher prices.  Unfortunately addressing the problem isn’t simple. There are very complex logistic and economic issues that need to be addressed, but if people don’t have access to affordable heating, the situation can quickly become dangerous, or even deadly.  In the short term, lawmakers are taking a number of steps to get propane shipped to the impacted areas, but how the US chooses to deal with this issue in the long term is still being debated.

1 Dead and 27 Hospitalized from Carbon Monoxide at Restaurant

By Holly Maher

On Saturday evening, February 22, 2014, 1 person died and 27 others were hospitalized due to carbon monoxide poisoning.  The individuals were exposed to high levels of carbon monoxide that had built up in the basement of a restaurant.  The restaurant was evacuated and subsequently closed until the location could be deemed safe and the water heater, located in the basement, was inspected and cleared for safe operation.

So what caused the fatality and 27 hospitalizations?  We start by asking “why” questions and documenting the answers to visually lay out all the causes that contributed to the incident.  The cause and effect relationships lay out from left to right.

In this example, the 1 fatality and 27 hospitalizations occurred because of an exposure to high levels of carbon monoxide gas, which is poisonous.  The exposure to high levels of carbon monoxide gas was caused not only by the high levels of carbon monoxide gas being present, but also because the restaurant employees and emergency responders were unaware of the high levels of carbon monoxide gas.

Let’s first ask why there were high levels of carbon monoxide present.  This was due to carbon monoxide gas being released into the basement of the restaurant. The carbon monoxide gas was released into the basement because there was carbon monoxide in the water heater flue gas and because the flue gas pipe, intended to direct the flue gas to the outside atmosphere, was damaged.  The carbon monoxide was present in the flue gas because of incomplete combustion in the water heater.  At this point in the investigation, we don’t have any further information.  This can be indicated as a follow-up point on the cause map using a question mark.  We have also identified the reason for the flue gas pipe damage as a question mark, as we do not currently have the exact failure mechanism (physical damage, corrosion, etc.) for the flue gas pipe.  What we can identify as one of the causes of the flue gas pipe failure is an ineffective inspection process.  How do we know the inspection process was ineffective?  Because we didn’t catch the failure before it happened, which is the whole point of requiring periodic inspections.  This water heater had passed its annual inspection in March of 2013 and was due again in March 2014.

If we now ask the question, why were the employees unaware of the high levels of carbon monoxide present, we can identify that not only is carbon monoxide colorless and odorless, but also there was no carbon monoxide detector present in the restaurant.  There was no carbon monoxide detector installed because it is not legally required by state or local codes.  The regulations only require carbon monoxide detectors to be installed in residences or businesses where people sleep, i.e. hotels.

Once all the causes of the fatality and hospitalizations have been identified, possible solutions to prevent the incident from happening again can be brainstormed.  Although we still have open questions in this investigation, we can already see some possible ways to mitigate this risk going forward.  One possible solution would be to legally require carbon monoxide detectors in restaurants.  This would have alerted both employees and responders of the hazard present.  Another possible solution would be to require more frequent inspections of this type of combustion equipment.

To view the Outline and Cause Map, please click “Download PDF” above.

 

Olympic Track Worker Hit By Bobsled

By Kim Smiley

A worker at the bobsled track for the Sochi Winter Olympics was hit by a bobsled on February 13, 2014.  The worker suffered two broken legs and a possible concussion, but is reported to be stable after undergoing surgery.  There was also minor damage done to the track.  Part of a lighting system suspended from the ceiling was replaced and time was needed to clean small plastic shards off the ice.

Investigation into this accident is still underway, but the information that is available in the media can be used to build an initial Cause Map. One of the advantages of using Excel to build Cause Maps is that they can be easily modified to incorporate additional information once the investigation is complete.

When beginning the Cause Mapping process, the first step is to fill in an Outline with the basic background information for an issue.  How an incident impacted the overall organizational goals is also documented on the bottom half of the Outline.  Once the Outline is completed, the Cause Map is built by asking “why” questions. (Click on “Download PDF” above to view a high level Cause Map and Outline for this accident.)

So why was the worker hit by a bobsled?  This occurred because a forerunner sled was sent down the track while the worker was on the track.  The forerunner sled was on the track because they are used to test the track prior to training runs and competitions, and training was scheduled later that day.  Forerunner sleds ensure that ice conditions are good and that all systems, like the timing system, are functional.  People at the top of the track can’t see the entire track so there wasn’t an easy way for them to identify the position of the worker prior to running the sled.  Initial reports are that the normal announcements were made to the workers prior to running the forerunner sled so it doesn’t appear that the people on the top of the track had any reason to suspect a problem.

The worker was on the track doing work to prepare it for the training runs and competition scheduled that day.  We can safely assume that he was unaware that the forerunner sled was running the track at the same time.  Investigators have determined that the worker was using a loud motorized air blower and believe he was unable to hear both the announcement and the approaching bobsled.  Two other workers were also working on the track, but they were able to scramble out of danger as the bobsled approached.  Until the investigation is complete, it won’t be clear if other factors were involved, but it seems the use of loud equipment played a role in the accident.

The final step in the Cause Mapping process is to find solutions to reduce the risk of a problem reccurring.  It appears that the current method of letting workers know to clear the track isn’t adequate in all situations.  Officials will need to modify the process, especially when loud equipment is in use, to ensure the safety of all workers.  Workers need to be on the track at times in order to do their jobs and there needs to be a way to ensure they have moved to a safe location prior to any sled running the track.

It’s worth noting this is not the first time someone has been hit by a bobsled. In 2005, recent silver medalist skeleton racer Noelle Pikus-Pace was hit by a bobsled.  She shattered a leg and ended up missing the 2006 Turin Olympics as a result.  This accident occurred on a different track, but it highlights the dangers of bobsled tracks and the important of ensuring safety.

Concerns Raised About Safety of Olympic Slopestyle Course

By Kim Smiley 

One of the stories making headlines leading up to the start of the 2014 Winter Olympics was concern about the safety of the slopestyle course.  There were early rumblings about the slopestyle course, especially after a few falls during training runs, but the media interest intensified after well-known snowboarder Shaun White withdrew from the event.   There is also a heighten sensitivity to safety concerns after the death of a luger during the last Winter Olympics , which was the first  death in Olympic training or competition since 1964.

Safety of the athletes involved in the Olympics is obviously paramount, but media coverage of slopestyle course safety concerns is also an issue because it created negative press for both the Olympics and the host country.  A Cause Map can be built to help analyze this issue and illustrate all the factors involved with the controversy surrounding the Olympic slopestyle course. (To see a high level Cause Map of this issue, click on “Download PDF”.)

Several athletes fell during training runs on the slopestyle course, which led to questions about course safety.   There were some injuries on the course, the most notable being Torstein Horgmo of Norway who broke his collarbone during a practice run.  Horgmo was a favorite to medal in the event and was unable to compete after his injury, which has to be heartbreaking.

The course is different from the typical slopestyle course, partly because this is the Olympics and the designer wanted an exciting course.   Athletes are getting more air time from the jumps on the course because they are large step-down jumps where the landing zones are below the ramps.  Designing the first Olympic slopestyle course was a unique challenge and there was no precedent.

The weather has been an added challenge for the course designer.  The jumps were created intentionally oversized with plans to modify them as needed to help accommodate melting concerns in the above freezing weather.  It’s much easier to make a jump smaller, as opposed to larger, so designers would rather err on the size of too big.  Rain and warm weather also played havoc with plans to test the course.  A test event scheduled for last February was canceled because of weather.  Tests were scheduled to allow for more time to groom the course prior to the Olympics, but six days of massive rains pushed course completion past schedule.

It’s also worth noting that there is inherent danger in slopestyle.  Slopestyle is an extreme sport with snowboarders performing high intensity tricks in the air.  Factor in the pressure to bring the goods in an Olympic event and snowboarders are going to be pushing their limits.  The falls don’t all happen on the jumps, despite media focus on the large jumps on this course.  Torstein Horgmo’s Olympic-ending crash occurred on the stair set on top of the course.   While a course can be made too dangerous, there will never be a completely safe slopestyle course because of the nature of the sport.

Snowboarder Shaun White made headlines when he pulled out of slopestyle because of injury concerns, but it’s also important to remember that slopestyle isn’t White’s main event.  Although White failed to reach the podium this Olympics, he was the defending gold medalist on the halfpipe and wasn’t willing to risk his chance to compete in that event.  White suffered minor injuries from a crash on the slopestyle course and he didn’t want to impact his halfpipe chances by getting hurt worse.  Halfpipe came after slopestyle so the consequences of a potential injury were high for White.  I’m willing to bet he would have been much more likely to compete in slopestyle if it occurred after the halfpipe event.

The slopestyle course was modified after training runs, which is typical for an untested slopestyle course.  Forty to fifty centimeters were removed from the top deck of the jumps and snow was added to the knuckles of each landing.  The course crew has been credited for listening to athletes’ concerns and being responsive to issues. Lessons learned from the experience with the first Olympics slopestyle course will hopefully help things go smoother next time.  I hope the focus during the next Olympics is on the amazing athletes and not so much on the course.

Volunteer Killed in Helicopter Fall

By ThinkReliability Staff

On September 12, 2013, the California National Guard invited Shane Krogen, the executive director of the High Sierra Volunteer Trail Crew and the U.S. Forest Service’s Regional Forester’s Volunteer of the Year for 2012, to assist in the reclamation effort of a portion of the Sequoia National Forest where a marijuana crop had been removed three weeks earlier.  Because the terrain in the area was steep, the team was to be lowered from a helicopter into the area.

After Mr. Krogen left the helicopter to be lowered, an equipment failure caused the volunteer to fall 40 feet.  He later died from blunt force trauma injuries. The Air Force’s report on the incident, which was released in January, determined that Mr. Krogen had been improperly harnessed.  The report also found that he should have never been invited on the flight.

To show the combination of factors that resulted in the death of the volunteer, we can capture the information from the Air Force report in a Cause Map, or visual root cause analysis.  First it’s important to determine the impacts to the goals.  In this case, Mr. Krogen’s death is an impact to the safety goal, and of primary consideration.  Additionally, the improper harnessing can be considered an impact to the customer service goal, as Mr. Krogen was dependent on the expertise of National Guard personnel to ensure he was properly outfitted.  Because it was contrary to Air Force regulations, which say civilian volunteers cannot be passengers on counter-drug operations, the fact that Mr. Krogen was allowed on the flight can be considered an impact to the regulatory goal.  Lastly, the time spend performing the investigation impacts the labor goal because of the resources used during the investigation.

Beginning with the impacted goal of primary concern – the safety goal – asking “Why” questions allows for the determination of causes that resulted in the impacted goal (the end effect).   In this case, Mr. Kroger died of blunt force trauma injuries from falling 40 feet.  He fell 40 feet because he was being lowered from a helicopter and his rigging failed.  He was being lowered from a helicopter to aid in reclamation efforts and because the terrain was too steep for the helicopter to land.

The rigging failure resulted from the failure of a D-ring which was used to connect the harness to the hoist.  Specifically, the D-ring was not strong enough to handle the weight of a person being lowered on it.  This is because the hoist was connected to Mr. Krogen’s personal, plastic D-ring instead of a government-issued, load-bearing metal D-ring.  After Mr. Krogen mistakenly connected the wrong D-ring, his rigging was checked by National Guard personnel.  The airman doing the checking didn’t notice the mistake, likely because of the proximity of the two D-rings and the fact that Mr. Krogen was wearing his own tactical vest, loaded with equipment, over the harness to which the metal D-ring was connected.

I think Mark Thompson sums up the incident best in his article for Time:   “The death of Shane Krogen, executive director of the High Sierra Volunteer Trail Crew, last summer in the Sequoia National Forest, just south of Yosemite National Park, was a tragedy. But it was an entirely preventable one.  It stands as a reminder of how dangerous military missions can be, and on the importance of a second set of eyes to make sure that potentially deadly errors, whenever possible, are reviewed and reversed before it is too late.”

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more.

Millions Impacted by Data Breach At Target

By Kim Smiley

Are you one of the millions of customers affected by the recent data breach at Target?  Because I am.  I for one am curious about how data for approximately 40 million credit and debit cards was compromised at one of the United States’ largest retailers.

The investigation is ongoing and many details about the data breach haven’t been released, but an initial Cause Map can be built to begin analyzing this incident.  The latest information released is that the Justice  Department is  performing an investigation into this incident.  An initial Cause Map can capture the information that is available now and can easily be expanded to include more detail in the future.  A box with a question mark can be used to indicate that more information is needed on the Cause Map. (Click on “Download PDF” to view an Outline and high level Cause Map.)

One of the causes that I think is worth discussing is that retailers in the United States are being specifically targeted for this type of attack in recent years.  The vast majority of credit and debit cards in use in the United States are magnetic strip cards, while Europe has been transitioning to newer credit card technology that uses chips.   Magnetic strip credit cards are a more desirable target for criminals because the technology to create fake magnetic strip cards is readily available.  The data on magnetic strip cards also stays the same while chips use unique codes for each transaction.  Cards with chips also require a pin when used, adding an additional layer of protection.

So why does the United States still use magnetic strip cards?  One of the main complicating factors is money.  Transitioning to cards that use chips requires a significant investment of money by both banks and retailers.  It is estimated that the cost to transition to the higher tech cards will be $8 billion so the money required is considerable. Both parties are nervous about being the first to commit to the process.

Rising credit card fraud rates in the United States have been increasing the pressure to move to newer credit card technology.  Credit card fraud rates in the U.S. have doubled in the 10 years since Europe began using chip cards.  As long as the United States remains the softest target, the rates are likely to increase.

On a positive note, the transition to the newer chip cards should be gaining traction in the next few years.  Credit card companies have typically footed the bill for credit card fraud, but many card companies have stated that merchants or banks that have not transitioned to chip cards will be held accountable for fraudulent purchases that the higher tech cards would have prevented by the end of 2015.

The frustrating thing is that there are limited ways individual consumers can protect themselves short of switching to cash.  You can be smart about where you swipe your cards, for example avoiding unmanned ATM kiosks, but a major retailer like Target didn’t seem suspicious.  As somebody who has had multiple instances of credit card fraud in the last few years, I look forward to a safer credit card in the future.