All posts by Angela Griffith

I lead comprehensive investigations by collecting and organizing all related information into a coherent record of the issue. Let me solve a problem for you!

Commuter Ferry Crash in NYC Injures 85

By ThinkReliability Staff

A commuter ferry struck a pier in Lower Manhattan, NY during the morning commute on January 9, 2013, injuring at least 85 people – some critically .  According  to US Coast Guard Captain Gordon Loebl, “We know that they hit the pier at a relatively high rate of speed.”

We can examine this issue in a Cause Map, a form of root cause analysis which provides a visual “map” of cause-and-effect relationships.  We begin by determining the impacts to the goals resulting from this incident.  The safety goal was impacted due to the large number of people who were injured.  (No fatalities have been reported as a result of the crash.)   The customer service goal was impacted because the ferry slammed into a pier (nobody expects that on their morning commute!).  The ferry was damaged, impacting the property goal.  Presumably the ferry will be out of service for some time, impacting the production goal, and will require repairs, impacting the labor goal.  Any time required for the response can also be considered an impact to the labor goal.

A Cause Map can begin as simply as beginning with an impacted goal and asking a couple of why questions.  In this case, the safety goal is impacted by the injuries, which were caused by the ferry striking the pier.  More detail can be added to the Cause Map by asking more “Why” questions.

In this case, it’s not clear what caused the crash, though drug or alcohol use by the captain has been ruled out.  There have been some recent complaints about maneuverability due to a recent overhaul replacing the engine and propulsion system but it’s not clear if this played a role in the crash.  It’s also unclear why the ship was traveling at 14 knots when it was about to dock.  Because the ship was about to dock, people had gotten up from their seats and were standing in hallways and on or near stairways, increasing the rate of injury.  It does not appear that there are any regulations requiring commuters to remain seated until the ferry has stopped moving.

The ferry company, as well as the appropriate transportation authorities, will continue their investigations to determine the causes of the ferry incident.  Once they do, they will provide recommendations or requirements to ensure a safer morning commute.

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more.

Rising Grain Prices 2003-2012

By ThinkReliability Staff

Grain prices have more than doubled since the year 2003, even down from their record high prices in 2008.  Grain is used for food, animal feed, and ethanol.  The demand for grain for all of these uses is increasing, but the supply is not keeping up.  This, along with other factors, has increased the price of grain to the point where it can be disastrous to the world’s poorest citizens.

We can examine the effect of the increased price of grain in a Cause Map.  A Cause Map allows us to lay out cause-and-effect relationships in an easy to understand, visual format.  To begin the Cause Map, we determine the impacts to the goals.  In this case, because we are looking at the grain price increases for the years 2003-2012 worldwide, our goals are broad.  The safety goal is impacted because there has been a high impact on the nutrition of the poor.  Grain prices have led to food riots in many locations, which is another impact to the safety goal.  The environmental goal has been impacted by the loss of usable cropland.  The increase in food prices can be considered an impact to the customer service goal.  Demand outpacing supply can be considered a production goal (considering the worldwide demand and supply).  Lastly, the increase in the price of grain itself can be considered an impact to the property goal.

Beginning with the safety goals: nutritional deficiency and food riots resulting from the increase in the price of food.  The increase in the price of food affects the poor in two ways – it reduces individual buying ability and reduces the amount of food aid that can be bought for the same amount of money.  In short, a country providing a consistent monetary amount of food aid will provide less aid when the food is more expensive.  This double whammy is further worsened considering the impact of the cost of fuel – as it increases, even less food can be bought per aid dollar.

The increase in the price of food is directly impacted by the price of grain.  Grain is used as a food itself, as well as feed for animals that are used for food, and is a component of many other produced foods.  The cost of all these foods go up as the price of grain increases.

Why is the price of grain increasing?  There are many factors that result in the increase in the price of grain.  Firstly the cost of grain goes up as the cost of the fuel needed to transport it and the cost of fertilizer needed to grow it increase.  As the demand for fertilizer grows, the cost grows.  The demand grows, as the demand for all crops grows.

The supply vs. demand equation also contributes to the cost of grain.  When demand increases, and supply does not keep up, cost goes up.  The demand for grain has been increasing – for food to feed the growing population, and to produce input-intensive foods, which actually require more grain.  (For example, about 7 kg of grain are required to get 1 kg of beef.  As the demand for input-intensive foods increases, the demand for grain increases even more.)  The government mandates and subsidies that require the use of grain for bio-fuels – driven by the   increasing cost of oil – also substantially increases the demand for grains.  Making matters worse, in order to attempt to protect their population and agricultural industry, countries have been restricting exports and/or hoarding, further decreasing available supply for trade.

Demand is not keeping up with supply.  The growth in agricultural productivity – which allows for a higher crop yield – has not increased as quickly as demand.  Crops are lost to agricultural pests, droughts and floods, and a particularly virulent strain of steam rust fungus, which has affected many grain crops.  Lastly usable cropland is being lost, due to urbanization to support that growing population, as well as erosion and water depletion, which can be impacted by poor land management.  In many cases, the investment and infrastructure to allow for agricultural advances just isn’t there.

The issues discussed above become a vicious cycle, making solutions that much more difficult and important.  Specifically, world organizations have asked countries to examine their agricultural policies, including ethanol mandates and subsidies, export restrictions and taxes, and hoarding.  Work on advanced bio-fuels or Brazilian sugar cane ethanol can reduce the amount of agricultural land devoted to producing crops for biofuels, rather than food.  Investment and development funds, as well as increased aid, are being sought to help remedy the current situation.  Import taxes into many countries that have food shortages have been reduced or removed to try to reduce the cost of food.  These are big solutions – for a big issue.  It is estimated that 16% of the world’s population is chronically under-nourished.  Further increases in the cost of food will only make the situation worse, without making some of the changes discussed here.

To view the Outline and Cause Map, please click “Download PDF” above.  Or click here to read more about the crisis and actions taken by the World Bank.

Knife Cuts in Restaurants

By ThinkReliability Staff

Knife cuts in restaurants pose a big risk, not only to the restaurant employees themselves, but also to customers due to the potential risk of contamination by blood or bandages as a result of an employee who receives a laceration due to a knife cut.  There are steps that can be taken to reduce the risk of a knife cut.  While some of these steps can be taken by restaurant employees themselves, many will involve the restaurant management as well.  Although these recommendations are based on knife cuts that occur within the restaurant and food preparation industry, they are also relevant for use at home to protect against lacerations from knives.

You can view some different causes that can result in lacerations from knives in a Cause Map, or visual root cause analysis, by clicking “Download PDF” above.   With any root cause analysis, the goal is to determine as many solutions as possible to reduce the risk of the issue – in this case, knife cuts – from happening in the future.  When we put together a proactive investigation – not based on one specific incident, but rather combining any possible causes we can brainstorm to best determine solutions – we can use some examples of actual lacerations that have occurred, and also our personal experiences to brainstorm causes.  As with any investigation, the wider net we cast, the more ideas we brainstorm and the more possible solutions we can discover.

The setup of the food prep area is key to reducing cuts.  Inadequate lighting and distraction can lead to increased injury, as can the storage location of the knives.  (You’re much more likely to cut yourself grabbing a knife out of a drawer than off a magnetic strip or out of a block.)  The condition of the knives themselves is also key.  Properly maintained knives – that is, knives that are sharpened and the handles are properly attached – are less likely to cause cuts because dull knives, or those with loose handles, make it difficult to cut properly, increasing the risk of cuts.  Knives should be regularly sharpened and if a knife is damaged, it should be disposed of.  In addition, having the proper compliment of knives is important.  Proper cutting technique can reduce knife cuts, but a key component  to proper cutting technique is having the correct knife.

An additional component of proper cutting technique is training.  Training should include techniques for cutting as well as which knife to use for which type of cutting and what kind of food product.  Some of the key aspects to knife cutting technique that can decrease the incidence of knife cuts include: cut away from you, using a cutting board with a mat to keep it from slipping.  Hold objects with your fingers pointing straight down, using your knuckles as a guide for the knife.  It’s very difficult to cut yourself while holding a knife this way.

Not all knife cuts occur while cutting food.  One frequent source of knife cuts is reaching into a sink full of soap water and grabbing a knife blade.  When hand washing knives, put it one knife at a time and don’t let go of it.  Always set knives well onto the counter with the blade facing away from you.  And if a knife falls off a prep surface, step back and let it fall.  If you are particularly concerned about knife cuts, you may want to consider the use of Kevlar gloves.  Restaurants that use Kevlar gloves have seen a remarkable decrease in injuries due to knife cuts.

To view the Cause Map, please click “Download PDF” above

Slips, Trips and Falls: A Root Cause Analysis Primer

By ThinkReliability Staff

Slips, trips and falls happen every day.  Falls are responsible for tens of thousands of deaths each year.  (Slips and trips are considered a subset of falls, and are included in these numbers.)  Falls on the job account for 12-15% of all worker’s comp costs.  The direct and indirect costs of workers injured and killed on the job are estimated to be billions of dollars each year, both in worker’s comp claims and in lost productivity.  In 1999, as an example, 5,100 workers were killed by falls and over 570,000 injuries were reported.  However, there are many things that can be done to prevent and lessen the impact of falls.  Performing a Cause Map, a visual root cause analysis, will allow us to identify all the potential causes of falls.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.  Once we’ve done that, we can identify all the solutions.

A worker is injured during a fall because the worker strikes the floor, or another object, and the object contacted is hard, and the worker hits in a way that causes injury.  When we say that workers are injured because they hit an object in a way that causes injury, what we are really talking about is factors that worsen a fall, and make injury more likely. The worker could land on a part of his or her body that is more easily injured.  Another way that injuries can be worsened is if a worker falls farther than his or her height (i.e., not a same-level fall).

The worker strikes the floor or other object because he or she falls, and there is no other support for the body, such as a handrail, or a harness.   There are four different ways to fall: slips, trips, the “step and fall” (where a person gets off-balance while stepping), and becoming unbalanced on moving equipment.

A worker slips when there is inadequate traction, either because the force of stepping off is too high, or the coefficient of friction is too low.  The force of stepping off can be higher than average if the worker is walking quickly or running, making a sudden change in direction, or if he or she has an awkward gait, from injury or old age, for example.  The coefficient of friction is a function of the traction provided by the shoes the worker is wearing and the “slipperiness” of the walking surface.  The coefficient of friction is too low if the traction of the worker’s shoes is inadequate and if the floor is slippery, because the surface is wet, icy and/or oily and does not have a non-skid coating.  Of course, for this to be an issue at all, the worker has to step into the slippery area.

A worker can become off-balance by encountering an unexpected height difference (known as the “step and fall”).  This occurs in one of two ways.  Either the front foot lands on a surface lower than expected, or the ankle turns due to one side of the foot ending up higher than the other side, with footwear that inadequately supports the ankle.  These are both due to an unexpected height difference.

When a worker trips, it is because his or her toe is stopped, but his or her upper body is not stopped.  The upper body is moving because the worker is moving and he toe is topped because it encounters an object in the walking path, a rise in the walking path, or a difference in height of subsequent stairs.

Last but not least, falls can be caused by workers who become unbalanced on moving equipment.  For this to occur, the worker must be inadequately secured to the equipment while the equipment changes motion, either by turning, decelerating or stopping, or accelerating or starting to move.

Once we have built our Cause Map and found all the potential causes, we can assign potential solutions to all appropriate causes.  The solutions are in green boxes, near the cause(s) they “solve”.   You can see that some of the solutions are the responsibility of the company, and some are the responsibility of the worker, and some are both.   Although many of the responsibilities lie with the worker, it is in a company’s best interest to provide training on how to prevent, manage and mitigate falls.  Falls may seem like everyday, ordinary minor occurrences, but the consequences can be anything but minor.

Planes Nearly Collide Over DC

By ThinkReliability Staff

Two planes came within seconds of a collision on  July 31, 2012 when both were directed to the same airspace by controllers.  Although no incident occurred, such near misses should be investigated thoroughly to prevent incidents in the future.

We can perform a root cause analysis of this incident in visual Cause Mapping form.  We begin with the impacts to the goals.  In the case of a near-miss like this one, some of the impacts to the goals will be hypothetical, based on the potential of the incident actually occurring.  For example, the safety goal is impacted because of the potential of death or injury to the passengers and crew on the planes.  The property goal is also impacted due to the potential of damage to the planes.  Even though this incident was considered a near-miss, there were some actual impacts to the goals, such as the delay in landing of the inbound plane, which can be considered an impact to the customer service, schedule, and  labor goal.

Once we have determined the impacts to the goals, we can begin the analysis by asking “why” questions.  In this case, the safety and property goals were impacted due to the potential collision of two planes.  These planes could have collided because they were on a collision course.   One plane was taking off directly towards another  plane that was trying to land.  The landing plane was landing in the opposite direction as usual (from the South instead of from the North) in order to avoid high winds from an incoming storm.  The plane taking off was cleared to take off towards the incoming plane (towards the South) by a different controller who was unaware that incoming planes were coming in from a different direction.  Communication of the change in incoming flights was not made to all controllers in the area and, although no details are available, it appears that the procedure used by the controllers when changing the flow towards the airport was inadequate.

There are thousands of recorded errors by air traffic controllers every year, and Reagan National (where this incident occurred) has had some particularly high-profile incidents, such as when a controller fell asleep (see   previous blog), involving air traffic controllers.  On August 10, 2012, two aircraft clipped each other at another Washington, DC area airport, although it is unclear if controllers were involved.  (See the article here.)  A congressional and FAA investigation is underway, and will hopefully address some needed improvements in air safety.

To view the Outline and Cause Map, please click “Download PDF” above.

SL-1 Explosion-The Only Fatal Reactor Accident in the US

By ThinkReliability Staff

The only fatal reactor accident in the United States occurred on January 3, 1961, when an Army prototype known as SL-1 (for stationary, low power reactor, unit 1) exploded, killing the 3 operators who were present.  We’ll use the SL-1 tragedy as an example of how the Cause Mapping process can be applied to a specific incident.  A thorough root cause analysis built as a Cause Map can capture all of the causes in a simple, intuitive format that fits on one page.

The SL-1 tragedy killed the three operators present, which is an impact to the safety goal.  Another goal is that there be no damage to the vessel. In the case of SL-1, the  vessel sustained extensive damage.

The loss of life and vessel damage were both caused by the reactor exploding.  The reactor exploded because it went prompt critical (an uncontrollable, exponentially increasing fission reaction).  The reactor went prompt critical because withdrawal of the central rod can cause prompt criticality and because the rod was rapidly, manually lifted 26.4″ out of the core.

Withdrawal of the central rod can cause prompt criticality due to a lack of shutdown margin in the core, and inadequate safety criteria.

Because most of the evidence was so effectively destroyed, nobody really knows why the control rod was lifted out of the core.  There are two theories (disregarding the bizarre and improbable murder/suicide theory): 1) the control rod got stuck while being lifted to be attached to the drive mechanism, and, as the operator was exerting greater force on it, suddenly came free, resulting in a lift far greater than intended, or that an rod drop testing/exercising was performed improperly.

The control rod may have become stuck and came free while being attached because it was required to be lifted 4″ out of the core and because control rods had been sticking.  The control rods had been sticking for one or more of the following reasons: 1) reduced clearances due to radiation damage (which can cause structural material to swell), 2) the passage was blocked due to loss of poison strips in the channel, caused by poor design and inadequate testing, or 3) lifting equipment not working properly due to inadequate lifting capacity of the lifting equipment.

It’s also possible that an exercising/testing was potentially improperly performed.  This could have occurred because the operators chose to exercise/test the rods, attempting to ensure that they would perform properly, and because they didn’t realize what would happen. This is because of inadequate training and inadequate work instructions.  The testing was also potentially done improperly due to inadequate work instructions.

On a positive note, the SL-1 incident did initiate some positive changes in the nuclear industry.  Most notably, reactor design has improved and incorporated a “one-rod stuck” criteria which specifies that a reactor can NOT go critical by the removal of any one control rod.  Additionally, procedures and training have gotten more intense and more formal, and planning for emergencies has increased.

Hindenburg Crash – May 6, 1937

By ThinkReliability Staff

On May 6th, 1937, the Hindenburg burst into flames over the Lakehurst, NJ Naval Base, after completing a successful trip across the Atlantic.  35 of the 97 passengers (and one of the ground crew) were killed.  The Hindenburg itself was a total loss, and the popularity of airships never recovered after the accident.

The loss of 36 lives and the loss of the Hindenburg were both caused by the fire aboard. The loss of popularity of airships was caused by both the loss of the Hindenburg, and by the loss of lives.  The next question to ask is “Why did the fire occur?”

For the Hindenburg, this is where things start to get interesting.  There are three separate theories about why the fire started.  There are people who believe very strongly in each.   Luckily for us, the beauty of the Cause Map form of a root cause analysis is that we can use it even if we haven’t determined which theory is correct.

The first theory is that the fire started from sabotage.  Because the Hindenburg was frequently used as a Nazi propaganda tool, some thought it was almost too easy of a target for sabotage from anti-Nazi activists (who included in their number the designer of Hindenburg, Dr. Hugo Eckener.)  There was even a “suspicious” character who survived the crash, a German acrobat living in America.  However, eventually the FBI dismissed the idea of sabotage as a “red herring.”

Another theory is that the fire began when static electricity ignited the flammable cover of the airship.  The major proponent of this theory, Dr. Addison Bain, has run tests on pieces of the Hindenburg cover preserved from the wreck site.  (This was not until 1994.)  He has also found supporting evidence from historic records of the Zeppelin company.

The other theory is that static electricity ignited a flammable hydrogen-oxygen mixture.  This was the original cause attributed to the disaster by the U.S. Department of Commerce’s root cause analysis investigation after the crash.  There are also people who claim that Dr. Bain’s theory is physically impossible, and do not specifically champion a cause, but treat this one as the most likely.

Note that we’re not espousing a theory – we are just recording all of the possibilities.  Once we have done that, the Cause Map allows us to find solutions for any potential causes.  Once we have all the theories mapped out, we can use the Cause mMp as a resource to determine the solutions that are most helpful, or continue our root cause analysis investigation to determine which causes are most likely.

Loss of Firefighting Plane Affects Firefighting Efforts

By ThinkReliability Staff

Wildfires in the Rocky Mountain region have been plaguing the nation for weeks.  The firefighting mission took a severe hit when a C-130 that was dropping flame retardant on the fire crashed on the evening of July 1, 2012, killing four of six crewmembers and injuring the other two.  As a result of the crash, the Air Force grounded other C-130s for two days, increasing the work for firefighters on the ground.

Although the Air Force has not released details of what exactly resulted in the plane crash, we can look at the information we do have available in a visual root cause analysis or Cause Map.  We begin by determining which of the organization’s goals were impacted in the Outline.  First, because of the deaths of the crewmembers, the safety goal was impacted.  The environmental and customer service goals were impacted because of the decreased ability to fight wildfires.  The schedule goal was impacted because other C-130s were grounded for two days.  The property goal was impacted because of the damage to the plane, and the labor goal was impacted due to the increased difficulty for remaining firefighters in fighting the fire.

Once we have determined these impacts to the goals, we can begin asking “Why” questions to draw out the cause-and-effect relationships that led to the impacted goals.  The safety, and other goals, were impacted due to the plane crash.  Again, although the Air Force has not released details of its ongoing investigation, it is believed that  downdraft (caused by the same high winds in the area that are helping to keep the wildfires travel) may have contributed to the crash.  An additional contributor is the fact that the plane was likely traveling at extremely low altitude, which allowed the plane to perform its task to help fight wildfires.  Lastly, it is possible that the heavy demands placed on the plane due to the extent of the fires may have contributed to the incident.  If, during the course of the investigation, it is determined that one of these causes was not related to the plane crash, the causes can be crossed out, but left on the map.  Evidence that shows that this cause did not result in the incident should be placed under the box.  This allows us to keep a complete record of which causes were considered.

Once the causes related to the incident have been placed on the map, solutions to mitigate the risk of this type of incident from happening again can be brainstormed and implemented.

To view the Outline and Cause Map, please click “Download PDF” above

Pipeline Spill in Alberta Threatens Drinking Water

by ThinkReliability Staff

A pipeline spill in Alberta, Canada of up to 480,000 litres was noticed on the evening of June 7, 2012.  Although pipelines are estimated to spill approximately 3.4 million litres a year, they are not frequently near populated areas or water sources.  However, due to the proximity of this spill to a drinking water source, there was the potential of impact to drinking water.  An issue of this magnitude, with this type of impact, is thoroughly investigated to reduce the risk of recurrence.  We can examine this issue in a visual root cause analysis performed as a Cause Map.

We begin with the impacts to the goals.  In this case, the safety goal is impacted because of the potential impact to drinking water.  The environmental goal is impacted because of the spill of sour crude oil.  The spill is impacting area residents in a variety of ways, which can be considered an impact to the customer service goal.  The production goal was impacted due to a 10-day shutdown of a portion of the pipeline.  The property goal is impacted by the damage to the pipeline, and the labor goal is impacted by the response and cleanup required.

Once we have developed the impacts to the goals, we can ask “Why” questions to develop the cause-and-effect relationships that resulted in those impacts.  The potential impact to drinking water resulted from the proximity of the spill to a drinking water source, because the spill was in a populated area, and the oil spill itself.  The oil spill resulted from damage to the pipeline and the time elapsed before the spill was stopped.  Because the longer a spill goes undetected, the more environmental impact it has, consideration of the adequacy of monitoring, inspection and testing must be considered to ensure that this risk is reduced.

Although the cause of the pipeline damage is still being investigated, causes that have resulted in prior pipeline damage include construction damage, internal corrosion, and external corrosion.  External corrosion can result from exposure to water, which in this case was impacted by recent flooding of the river and shallow burying of the pipe, as was typical with earlier installations.  The age of the pipe may have also impacted the internal corrosion, as the more time that pipe is exposed to hydrocarbons (which the pipe transmits) the more corrosion will occur.

Immediate solutions include isolating the damaged area with a valve.  Then repairs were made to the pipeline, and cleanup began.  Cleanup is expected to take most of the summer.  There have been calls for increased monitoring, testing, and inspection of the line, and with an incident of this type, that frequency should be examined to ensure it is appropriate to minimize these types of risk.

To view the Outline, Cause Map, and Solutions, please click “Download PDF” above.

Deadly Plane Crash in Lagos, Nigeria

by ThinkReliability Staff

A devastating air crash in Lagos, Nigeria killed all on board and at least 10 on the ground.  This was the first major commercial air disaster since 2006.  Safety efforts since that disaster resulted in the US Federal Aviation Administration ( FAA) granting Nigerian    airlines its top air-safety rating.  Now concerns about air safety in Nigeria have resurfaced.  As a result of the crash, according to Harold Demuren, head of Nigerian civil aviation body: “We have suspended the entire Dana fleet.  They will be grounded as long as it takes to carry out the necessary investigations into whether they are airworthy.”

We can examine this incident in a Cause Map, or a visual root cause analysis.  We begin with the goals that were impacted.  In this case, the safety goal was impacted due to the deaths of people on the plane and on the ground.  We begin by asking “Why” questions to put together a very simple cause-and-effect relationship.  In this case, after losing both engines, Dana Air flight 992 crashed into a residential building in a highly populated suburb of Lagos, Nigeria, killing all 153 people on board and at least 10 on the ground.

The investigation of the plane crash is still ongoing.  However, it is known that both engines of the plane lost power, causing the plane to rapidly lose altitude and crash into a highly populated area.  Some of the areas being investigated that may have contributed to the crash are:

1) a bird strike (bird remains were found in one engine),

2) poor maintenance (although the plane was regularly inspected, there were also reports of leaking hydraulics and a history of poor airline safety in Nigeria, which appeared to have been remedied in recent years as indicated by the US FAA’s granting of its top air-safety rating,

3) overworked planes, likely due to financial considerations (the plane that crashed was on its fourth trip of the day), and/or

4) the age of the airplane (at 22 years old, it was technically not permitted to fly in Nigeria, which bans the use of planes over 20 years old).

As more information is revealed during the investigation it can be added to the Cause Map.  As the investigation is concluded, there will likely be more changes to Nigerian requirements and oversight for air safety.

To view the Outline and Cause Map, please click “Download PDF” above.