Tag Archives: Cause Mapping

Track Workers Killed by Train

By ThinkReliability Staff

A derailment and the fatalities of two railroad workers on April 3, 2016 has led to an investigation by the National Transportation Safety Board (NTSB). In this investigation, the NTSB will address the impacts of the accident, determine what caused the accident and will provide recommendations to prevent similar accidents from recurring. While the investigation is still underway, a wealth of information related to the accident is already available to begin the analysis. We will look at what is currently known regarding the accident in a Cause Map, a visual form of root cause analysis.

The first step of the analysis is to define the problem. This includes the what, when, and where of the incident, as well as the impacts to the organizational goals. Capturing the impacts to the goals is particularly important because the recommendations that will result from the analysis aim to reduce these impacts. If we define the problem as simply a “derailment”, recommendations may be limited to those that prevent future derailments. Not only are we looking for recommendations to prevent future derailments, we are looking for recommendations to prevent all the impacted goals. In this case, that includes worker safety: 2 workers died, public safety: 37 passengers were injured, customer service: the train derailed, property: the train and some construction equipment was damaged, and labor: response and investigation are required.

The analysis is performed by beginning with the impacted goals and developing the cause-and-effect relationships that led to those impacts. Asking “why” questions can help to identify some of the cause-and-effect relationships, but there may be more than one cause that results in an effect. In this case, the worker fatalities occurred because the train struck heavy equipment and the workers were in/on/near the equipment. Both of these causes had to occur for the effect to result. The workers were on the equipment performing routine maintenance. In addition, their watch was ineffective. When capturing causes, it’s important to also include evidence, which validates the cause.

We know the watch was ineffective, because federal regulation requires a watch for incoming trains that gives at least a fifteen second warning. Fifteen seconds should have been sufficient time for the workers to exit the equipment. Because this did not happen, it follows that the watch was ineffective.

The train struck the heavy equipment because the equipment was on track 3, the train was on track 3, and the train was unable to brake in time. It’s unclear why the heavy equipment was on the track; rail safety experts say heavy equipment should never be directly on the track. The train was on track 3 because it was allowed on the track. Work crews are permitted to shut off the current to preclude passage of trains into the work zone, but they did not in this case, for reasons that are still being investigated. Additionally, the dispatcher allowed the train onto the track. Per federal regulations, when workers are on the track, train dispatchers may not allow trains on track until roadway worker gives permission. It appears that in this case the workers either failed to secure permission to work on the track (thus notifying the dispatcher of their presence) or the work notification was improperly cancelled, allowing trains to return to the track, possibly due to a miscommunication between the night and day crews. This is also still under investigation.

While inspection of the cars and maintenance records found no anomalies, the braking system is under investigation to determine whether or not it affected the train’s ability to brake. Also under investigation is the Positive Train Control (PTC), which should have emitted warnings and slowed the train automatically. However, the supplemental shunting device, which alerts the signaling system that the track is occupied, and is required by Amtrak rules, was not in place. Whether this was sufficient to prevent the PTC from stopping the train in time is also under investigation. The conductor placed the train in emergency mode 5 seconds before the collision. As the train was traveling at 106 mph (the speed limit was 110 mph in the area), this did not give adequate time to brake. There should have been a flagman to notify the train that a crew was on the track, but was not. The flagman also carries an air horn, which provides another notification to the track crew that a train is coming.

Says Ashley Halsey III, reporting in The Washington Post, “Basic rules of railroading and federal regulations should have prevented the Amtrak derailment near Philadelphia on Sunday that killed two maintenance workers.” It appears that multiple procedural requirements were not followed, but more thorough investigation is required to determine why and what can be done in the future to improve safety by preventing derailments and worker fatalities.

To view the available information in a Cause Map, please click “Download PDF” above.

DC Metro shut down for entire day after fire for inspections

By Kim Smiley 

A fire in a DC Metro tunnel early on March 14, 2016 caused delays on three subway lines and significant disruption to both the morning and evening commutes.  There were no injuries, but the similarities between this incident and the deadly smoke incident on January 12, 2015 (see our previous blog on this incident) led officials to order a 24-hour shutdown of the entire Metro system for inspections and repairs.

The investigation into the Metro fire is still ongoing, but the information that is known can be used to build an initial Cause Map.  A Cause Map is built by asking “why” questions and visually laying out all the causes that contributed to an incident.  Cause Mapping an issue can identify areas where it may be useful to dig into more detail to fully understand a problem and can help develop effective solutions.

So why was there a fire in the Metro tunnel?  Investigators have not released details about the exact cause, but have stated that the fire was caused by issues with a jumper cable.  Jumper cables are used in the Metro system to bridge gaps in the third rail, essentially functioning as extension cords.  The Metro system uses gaps in the third rail to create safer entry and exit spaces for both workers and passengers because of the potential danger of contact with the electrified third rail.  The third rail carries 750 volts of electricity used to power Metro trains and could cause serious injury or even death if accidently touched.

The jumper cables also carry high voltage and fires and/or smoke can occur if one malfunctions.  Investigators have not confirmed the exact issue that lead to this fire, but insulation failures have been identified in other locations and is a possible cause of the fire. (Possible causes can be added to the Cause Map with a “?” to indicate that more evidence is needed.)

One of the things that is always important to consider when investigating an incident is the frequency of occurrence of similar issues.  The scope of the investigation and possible solutions considered will likely be different if it was the 20th time an incident has occurred rather than the first. In this case, the fire was similar to another incident in January 2015 that caused a passenger death.  Having a second incident occur so soon after the first naturally raised questions about whether there were more unidentified issues with jumper cables.  The Metro system uses approximately 600 jumper cables and all were inspected during the day-long shutdown. Twenty-six issues were identified and repaired. Three locations had damage severe enough that Metro would have immediately stopped running trains through them if the extent of the damage had been known.

The General Manger of the DC Metro system, Paul J. Wiedefeld, is relatively new to his position and has been both praised and criticized for the shutdown.  Trying to implement solutions and reduce risk is always a balancing act between costs and benefits.  Was the cost of a full-day shutdown and inspections of all jumper cables worth the benefit of knowing that the cable jumpers have all been inspected and repaired?  At the end of the day, it’s a judgement call, but I personally would be more comfortable riding the Metro with my children now.

For the first time, autonomous car is at fault for a crash

By Kim Smiley

On February 14, 2016, the self-driving Google car was involved in a fender bender with a bus in Mountain View, California.  Both vehicles were moving slowly at the time and the accident resulted in only minor damage and no injuries.  While this accident may not seem like a very big deal, the collision is making headlines because it is the first time one of Google’s self-driving cars has contributed to an accident.  Google’s self-driving cars have been involved in 17 other fender benders, but each of the previous accidents was attributed to the actions of a person, either the drivers of other vehicles or the Google test driver (while they were controlling the Google car).

The accident in question occurred after the Google car found itself in a tricky driving situation while attempting to merge.  The Google car had moved over to the right lane in anticipation of making a right turn.  Sandbags had been stacked around a storm drain, blocking part of the right lane.  The Google car stopped and waited for the lane next to it to clear so that it could drive around the obstacle.  As the Google car moved into the next lane it bumped a bus that was coming up from behind it.  Both the driver of the bus and the Google car assumed that the other vehicle would yield.  The test driver in the Google car did not take control of the vehicle and prevent the car from moving into the lane because he also assumed the bus would slow down and allow the car to merge into traffic. (Click on “Download PDF” to view a Cause Map that visually lays out the causes that contributed to this accident.)

Thankfully, this collision was a relatively minor accident. No one was hurt and there was only relatively minor damage to the vehicles involved. Lessons learned from this accident are already being incorporated to help prevent a similar incident in the future. Google has stated that the software that controls the self-driving cars has been tweaked so that the cars will recognize that buses and other large vehicles may be less likely to yield than other types of vehicles. (I wonder if there is a special taxi tweak in the code?)

It’s also worth noting that one of the driving factors behind the development of autonomous cars is the desire to improve traffic safety and reduce the 1.2 million traffic deaths that occur every year.  The Google car may have contributed to this accident, but Google cars have so far generally proved to be very safe.  Since 2009, Google cars have driven more than 2 million miles and have been involved in fewer than 20 accidents.

One of the more interesting facets of this accident is that it raises hard questions about liability.  Who is responsible when a self-driving car causes a crash? The National Highway Traffic Safety Administration (NHTSA) recently determined that for regulatory purposes, autonomous vehicle software is a “driver” which may mean that auto manufacturers will assume greater legal responsibility for crashes.  NHTSA is working to develop guidance for self-driving vehicles, which they plan to release by July, but nobody really knows yet the impact self-driving cars will have on liability laws and insurance policies.  In addition to the technology issues, there are many legal and policy questions that will need to be answered before self-driving cars can become mainstream technology.

Personally, I am just hoping this technology is commercially available before I reach the age where my kids take away my car keys.

Heavy metal detected in moss in Portland

By Kim Smiley

Residents and officials are struggling to find a path forward after toxic heavy metals were unexpectedly found in samples of moss in Portland, Oregon. According to the U.S. Forest Service, the moss was sampled as part of an exploratory study to measure air pollution in Portland.  The objective of the study was to determine if moss could be used as a “bio-indicator” of hydrocarbons and heavy metals in air in an urban environment.  Researchers were caught off guard when the samples showed hot spots of relatively high heavy metal levels, including chromium, arsenic, and cadmium (which can cause cancer and kidney malfunction).  Portland officials and residents are working to determine the full extent of the problem and how it should be addressed.

So where did the heavy metals come from?  And how is it that officials weren’t already aware of the potential issue of heavy metals in the environment? The investigation into this issue is still ongoing, but an initial Cause Map can be built to document what is known at this time.  A Cause Map is built by asking “why” questions and visually laying out all the causes that contributed to the problem.  (Click on “Download the PDF” to view the initial Cause Map.)

Officials are still working to verify where the heavy metals are coming from, but early speculation is that nearby stained-glass manufacturers are the likely source.  Heavy metals are used during the glass manufacturing process to create colors. For example, cadmium is used to make red, yellow and orange glass and chromium is used to make green and blue glass. The hot spots where heavy metals were detected surround two stained-glass manufacturers, but there are other industrial facilities nearby that may have played a role as well.  There are still a lot of unknowns about the actual emissions emitted from the glass factories because no testing has been done up to this point.  Testing was not required by federal regulations because of the relatively small size of the factories.  If the heavy metals did in fact originate from the glass factories, many hard questions about the adequacy of current emissions regulations and testing requirements will need to be answered.

Part of the difficulty of this issue is understanding exactly what the impacts from the potential exposure to heavy metals might be.  Since the levels of heavy metals detected so far are considered below the threshold of “acute”,  investigators are still working to determine what the potential long-term health impacts might be.

A long-term benefit of this mess is the validation that moss can be used as an indicator of urban air quality.  Moss has been used as an “bio-indicator” for air quality since the 1960s in rural environments, but this the first attempt to sample moss to learn about air quality in an urban setting.  As moss is plentiful and testing it is relatively inexpensive, this technique may dramatically improve testing methods used in urban environments.

Both glass companies have voluntarily suspended working with chromium, cadmium and arsenic in response to a request by the Oregon Department of Environmental Quality.  The DEQ has also begun additional air monitoring and begun sampling soil in the impacted areas to determine the scope of the contamination. As officials gain a better understanding of what is causing the issue and what the long-term impacts are, they will be able to develop solutions to reduce the risk of similar problems occurring in the future.

Crane Collapse In High Winds Kills One in NYC

By ThinkReliability Staff

A crane collapsed in New York City on February 5, 2016 killing one, injuring three, and damaging two city blocks. While an investigation is underway and the causes of the crane collapse have not yet been determined, the city has already implemented new rules to make crane operations safer. We can examine the potential cause-and-effect relationships that led to the issue in a Cause Map, or visual root cause analysis.

We begin by capturing the what, when and where of the incident within a problem outline. The crane collapse occurred February 5 at about 8:30 a.m. Anything that is different or unusual at the time of an incident should also be noted on the outline and an important difference on February 5 was the accelerating winds. The crane that collapsed was a crawler crane, and at the time of the collapse, workers were in the process of securing the crane because of the high winds. This was as expected. Says New York City Mayor Bill de Blasio, “The workers on Friday morning did not begin work on the site, but immediately seeing the winds, made the move to secure the crane, so their timing was appropriate. Upon arrival, they immediately determined the need to secure the crane.”

The impact to the goals as a result of the incident are also captured in the problem outline. In this case, the safety goal was impacted due to the death, as well as injuries. The environmental goal was impacted by water leaks resulting from damage. Customer service (looking at the citizens of New York City as customers) is impacted due to closures. Production is impacted because 418 additional cranes were secured as a result of the incident. Property impacts includes damage to the crane, as well as two city blocks. The labor goal was impacted because of the time required for the response and removal of the damaged crane. It’s also important to capture the frequency of similar events. OSHA reports it has investigated 13 fatal crane accidents in the last 5 years. (There was a crane collapse in New York City in 2008 that resulted in 4 deaths. Click here to see our previous blog on this topic.)

Once the impacts to the goals have been captured, the analysis begins with one of these goals, which is an effect. Asking “why” questions allows the development of cause-and-effect relationships. In this case, the fatality and injuries resulted from the collapse of a crane. It also resulted from people being in the area of the crane collapse. Both of these causes are required (the fatality and injuries would not have occurred if the crane had not collapsed, or if people had not been in the area) so they are listed vertically and joined with “AND”.

People were in the area where the crane collapsed because the area was inadequately secured. This is likely because construction workers were responsible for securing the area, as well as securing the crane. The reasons for the crane collapse are unknown. However, the investigation will look at human error, structural and equipment problems, and impacts from high winds. While the cause has not been determined, it is considered likely that the wind played a role. The crane was not yet secured, as the workers were in the process of attempting to secure it. It was not required to be secured because city regulations limit operation of cranes when wind is above 30 miles per hour(mph), or if there are gusts greater than 40 mph. The crane operators were working under a limit of 25 mph, as sometimes manufacturers use stricter limits. The forecast did not indicate that winds would be greater than 25 mph that day.

As a result of the incident, Mayor de Blasio put into place immediate and temporary rules regarding crane operation. These rules will be in place until a task force provides updated recommendations within 90 days. Uniformed personnel will assist with enforcing closures associated with crane use. Crane operations are limited to wind speeds less than 30 mph (or gusts up to 40 mph). A city sweep and increased fines were also put into place to ensure the updated regulations are followed.

To view a one-page overview of the Outline, Cause Map and interim solutions, click on “Download PDF”.

Investigators Blame “Human Error” for Train Collision

By Kim Smiley

On February 9, 2016, two commuter trains collided head-on in Upper Bavaria, Germany.  Eleven people were killed and dozens were injured.  Investigators are still working to determine exactly what caused the accident and the train dispatcher is currently under investigation for involuntary manslaughter and could face up to five years in prison if convicted.

Although the investigation is still ongoing, some information has been released about what caused the crash.  The two trains collided head-on because they were both traveling on the same track toward each other in opposite directions.  Running two trains on the same track is common practice in rural regions in Germany and these two trains were scheduled to pass each other at a station with a divided track. The drivers of both trains were unaware of the other train.  The accident occurred on a bend in a wooded area so the drivers could not see the other train until it was too late to prevent the collision.

The dispatcher failed to prevent a situation where two trains were running towards each other on the same track or to inform the drivers about the potential for a collision.  Investigators have stated that the dispatcher sent an incorrect signal to one of the trains due to “human error”.  After realizing the mistake – and that a collision was imminent – the dispatcher issued emergency signals to the trains, but they were too late to prevent the accident.

All rail routes in Germany have automatic braking systems that are intended to stop a train before a collision can occur, but initial reports are that the safety system had been manually turned off by the dispatcher.  German media has reported that the system was overridden to allow the eastbound train to pass because it was running late, but this information has not been confirmed.  Black boxes from both trains have been collected and analyzed.  Technical failure of the trains and signaling equipment have been ruled out as potential causes of the accident.

The information that has been released to the media can be used to build an initial Cause Map, a visual root cause analysis, of this issue.  A Cause Map visually lays out the cause-and-effect relationships and aids in understanding the many causes that contributed to an issue. The Cause Map is built by asking “why” questions. A detailed Cause Map can aid in the development of more effective solutions.

One of the general Cause Mapping rules of thumb is that an investigation should not stop at “human error”.  Human error is too general and vague to be helpful in developing effective solutions. It is important to ask “why” the error was made and really work to understand what factors lead to the mistake.  Should the safety system be able to be manually overridden?  Is the training for dispatchers adequate?  Does there need to be a second check on decisions by dispatchers?  Should two trains traveling in opposite directions be sharing tracks?  I don’t know the answers, but these questions should be asked during the investigation.  Charging the dispatcher with involuntary manslaughter may prevent HIM from making the same mistake again, but it won’t necessarily reduce the risk of a similar accident occurring again in the future.  To really reduce risk, investigators need to dig into the details of why the error was made.

A Lesson in Miscommunication: Valentine’s Day Blues

By Renata Martinez with contributions from the staff of ThinkReliability

I better preface this blog with a few comments….

It’s  not your average blog.  As a facilitator, I deal with a lot of serious problems on a daily basis.  Believe it or not I get these incidents stuck in my head and spend a lot of time thinking how I can better explain some lessons I’ve learned as a facilitator.  The goal of this blog is to offer a little perspective into an incident where “miscommunication” is identified and I wanted to use something you could probably relate to. Have you ever been in an argument with a significant other?  Maybe you didn’t see eye-to-eye on something (a Netflix option perhaps), or someone did something unexpected, or someone said something they didn’t mean (“Feel free to go golfing today; you don’t need to start on that to-do list”).

I also want to preface this blog by stating I am not a relationship counselor and I do not have a perfect relationship because of Cause Mapping.  However, I will say that Cause Mapping has helped me gain an understanding of a whole new perspective – his.

Without further ado, let me set the stage.  I have to take you back a bit.  Let me take you back to my Sophomore year in college. *enters dream state*

Valentine’s Day:  I hate it.  I’ve always thought it was a commercialized endorsement to express love.   The seemingly endless aisles in store after store of red and white hearts, chocolates, cards, teddy bears – gross.  …and then I met my future husband.  I was so head over heels for this guy, you would have thought I was 12 (but I was 20).  So when Valentine’s Day came around our new love I was actually excited.  The thought crept into my mind that I could be wow-ed this time; this could be it, I could learn to love Valentine’s Day.  I had the opportunity to relive every Nicolas Sparks novel ever written.  Expectations were set.

Leading up to the 14th, there was a conversation that took place that would ensure I will always despise the day…. I was asked what I wanted.  My mind quickly played one romantic scene after another but that’s not what came out of my mouth.  Instead I replied, “nothing.”  Well, being the literal person he is, he took this and ran with it – he got me nothing.  I was so disappointed because when I said “nothing”, OF COURSE I DIDN’T MEAN IT.   “Nothing” was a clear translation for: you figure it out, you surprise me with some immaculate plan. I didn’t want to spell out what I wanted; I wanted to be the cool, low maintenance, laid back girlfriend. I don’t think he was too impressed with my “cool, laid back attitude” when I came to the realization that I didn’t get anything for Valentine’s day – the first time I actually wanted something.

So that’s one branch of the Cause Map: why did I not ask for anything on Valentine’s Day?

At this same point of the Cause Map, it splits with an AND statement.  He also had to assume that I meant “nothing” when he asked.  In my mind it’s so obvious…it’s like when I haven’t talked or looked at you all day and when you ask “What’s wrong?” and I say “Nothing.”  I don’t mean it; it’s just an impulse reaction (and admittedly makes understanding me very difficult).  But since this was his first experience with me and this kind of situation, he didn’t think more about it.  He didn’t realize that I may actually want something.

I know this is a basic example of understanding both perspectives but it comes up a lot on investigations.  Understanding how people both give and interpret instructions/ directions is very important with regards to understanding solutions.  For instance, I will never say that I want “nothing” for a holiday ever again.  My new minimum “requirement” is a card. I really like cards.  And since I’ve got your attention, I’ll give you a little hint about present-giving: the presents should always be wrapped…in gift wrap (the bag from the store does not count).

Looking at solutions for him: he no longer takes the answer “nothing” literally.  Based on this experience, he now understands that I may not mean it.  So, the solutions identified will help him, but if we were looking at a different employee (or boyfriend in this example) – how do we ensure it doesn’t happen to them? This is where we need to consider others who may learn from this (not just those directly affected in this incident).  And this is why sharing lessons learned is so important.

By identifying both perspectives on the Cause Map, we can learn a lot about why an incident occurred (and what had to happen).  This yields more effective solutions that will prevent reoccurrence.  …after all: happy wife, happy life . . . right?!

To view both perspectives on a Cause Map, click on “Download PDF” above.

 

Failure of the Nipigon River Bridge

By Kim Smiley

On the afternoon of January 10, 2016, the deck of the Nipigon River Bridge in Ontario unexpectedly shifted up about 2 feet, closing the bridge to all vehicle traffic for about a day.  After an inspection by government officials and the addition of 100 large cement blocks to lower the bridge deck, one lane was reopened to traffic, with the exception of oversized trucks. Heavier trucks are required to detour around the bridge with the main alternative route requiring crossing into the United States.  This failure is still being investigated and it isn’t known yet when it will be safe to open all lanes on the bridge.

More information is needed to understand all the details that led to this failure, but an initial Cause Map, a visual root cause analysis, can be built to illustrate what is currently known. The first step in the Cause Mapping process is to fill in the Outline to document the basic background information (the what, when and where) and the impacts to the organization’s goals resulting from the issue.  For this example, the bridge was damaged and significant resources will be needed to investigate the failure and repair the bridge.  The closure of the bridge, and subsequently having only a single open lane, is also having a sizable impact on transportation of both people and goods in the area.  It is estimated that about $100 million worth of goods are moved over the bridge daily and there are limited alternative routes.

Once the Outline is completed, the Cause Map is built by asking “why” questions and visually laying out the cause-and-effect relationships.  Why did the deck of the bridge shift up?  Investigators still don’t have the whole answer. The Nipigon River Bridge is a cable stayed bridge and bolts holding the bridge cables failed, resulting in the deck of the bridge being pulled up at an expansion joint.  Two independent testing facilities, National Research Council of Canada in Ottawa and Surface Science Western at Western University, are conducting tests to determine the cause of the bolt failures, but no information has been released at this time.

The Nipigon River Bridge is a new bridge that has only been open since November 29, 2015. Some hard questions about the adequacy of the bridge design have been asked because the failure occurred so soon after construction.  Officials have stated that the bridge design meets all applicable standards, but investigators will review the design and structure during the investigation to ensure it is safe.  Ontario winters can be harsh and investigators are going to look into whether cold temperatures and/or wind played a role in the failure.  Eyewitnesses have reported a large gust of wind just prior to the bolt failure.  Investigators will determine what role the wind played.

The Cause Map can easily be expanded to incorporate new information as it becomes available. Once the Cause Map is completed, the final step in the Cause Mapping process is to develop solutions to prevent a similar problem from recurring.  In this example, adding the concrete blocks as counter weights allowed one lane of the bridge to be opened in the short term, but clearly a longer-term solution will be needed to repair the bridge and ensure a similar failure does not occur again.

Landslide of construction debris buries town, kills dozens

By ThinkReliability Staff

Shenzhen, China has been growing fast. After a dump site closed in 2013, construction debris from the rapid expansion was being dumped everywhere. In an effort to contain the waste, a former rock quarry was converted to a dump site. Waste at the site reached 100 meters high, despite environmental assessments warning about the potential for erosion. On December 20, 2015, the worries of residents, construction workers and truckers came true when the debris slipped from the quarry, covering 380,000 square meters (or about 60 football fields) with thick soil as much as 4 stories high.

A Cause Map can be built to analyze this issue. One of the steps in the Cause Mapping process is to determine how the issue impacted the overall goals. In this case, the landslide severely impacted multiple goals. Primarily, the safety goal was impacted due to a significant number of deaths. 58 have been confirmed dead, and at least 25 are missing. The environmental goal and customer service goal were impacted due to the significant area covered by construction waste. The regulatory goal is impacted because 11 have been detained as part of an ongoing criminal investigation. The property goal is impacted by the 33 buildings that were destroyed. The labor goal is also impacted, as are more than 10,600 people participating in the rescue effort.

The Cause Map is built by visually laying out the cause-and-effect relationships that contributed to the landslide. Beginning with the impacted goals and asking “Why” questions develops the cause-and-effect relationships. The deaths and missing persons resulted from being buried in construction waste. Additionally, the confusion over the number of missing results from the many unregistered migrants in the rapidly growing area. The area was buried in construction waste when waste spread over a significant area, due to the landslide.

The landslide resulted from soil and debris that was piled 100 meters high, and unstable ground in a quarry. The quarry was repurposed as a waste dump in order to corral waste, which had previously been dumped anywhere after the closure of another dump. Waste and debris was piled so high because of the significant construction debris in the area. There was heavy construction in the area because of the rapid growth, resulting in a lot of debris. Incentives (dumpsite operators make money on each load dumped) encourage a high amount of waste dumping. Illegal dumping also adds to the total.

While an environmental impact report warned of potential erosion, and the workers and truck drivers at the dump registered concerns about the volume of waste, these warnings weren’t heeded. Experts point to multiple recent industrial accidents in China (such as the warehouse fire/ explosion in Tianjin in August, the subject of a previous blog) as evidence of the generally lax enforcement of regulations. Heavy rains contributed to ground instability, as did the height of the debris, and the use of the site as a quarry prior to being a waste dump.

Actions taken in other cities in similar circumstances include charging more for dumping debris in an effort to encourage the reuse of materials and monitoring dump trucks with GPS to minimize illegal dumping. These actions weren’t implemented in Shenzhen prior to the landslide, but this accident may prompt their implementation in the future. Before any of that can happen, Shenzhen has a long way to go cleaning up the construction debris covering the city.

Facebook Bug Makes Users Feel Old

By ThinkReliability Staff

In a real blow for an industry constantly trying to remain hip and relevant, many Facebook users were notified of “46 year anniversaries” of their relationships with friends on Facebook on the last day of 2015. Facebook (which is itself only 11 years old) issued a statement saying “We’ve identified this bug and the team’s fixing it now so everyone can ring in 2016 feeling young again.”

While Facebook didn’t release any details about what caused the bug, a pretty convincing explanation was posted by Microsoft engineer Mark Davis. We can his theory to create an initial Cause Map, or visual root cause analysis. The first step in the Cause Mapping process is to fill out a problem outline. The problem outline captures the what (Facebook glitch), when (December 31, 2015), where (Facebook) and the impact to the organization’s goals. In this case, the only goals that appear to be impacted are the customer service goal (resulting from the negative publicity to Facebook) and the labor/time goal (which resulted from the time required to fix the glitch).

The next step in the Cause Mapping process is the analysis. The Cause Map begins with an impacted goal. Asking “Why” questions develops the cause-and-effect relationship that resulted in the effect. In this case, the impact to the customer service goal results from the negative publicity. Continuing to ask “Why” questions will add more detail to the Cause Map. The negative publicity was caused by Facebook posting incorrect anniversaries.

Some effects will result from more than one cause. Facebook posting incorrect anniversaries can be considered an effect that was caused by incorrect anniversary dates being identified by Facebook AND Facebook posting anniversary dates. Because both of these causes were required to produce an effect, they are joined with an “AND” on the Cause Map. (If the anniversary dates had been identified correctly, or if they weren’t posted on Facebook, the issue would not have occurred.) The incorrect anniversary dates were due to a software glitch (or bug), according to Facebook. Inadequate testing can generally be considered a cause whenever any bug is found in software that is used or released to the public. Had a larger range of dates been used to test this feature, the software glitch would have been identified before it resulted in public postings on Facebook.

Other impacted goals are added to the Cause Map as effects of the appropriate goals. In this case, the labor/ time goal is impacted because of the time needed to fix the glitch. The cause of this is the software glitch. All impacted goals should be added to the Cause Map.

The cause of the software bug is not definitively known. To indicate potential causes, we include a “?” after the cause, and include as much evidence as possible to support the cause. Testimony can be used as evidence for causes. In this case, the source of the potential causes is a Microsoft engineer, who described a potential scenario that could lead to this issue on Facebook. Unix, which is an operating system, associates the value of “0” with the date of 1/1/1970 (known as the Unix epoch). If the date a user friended another user was entered as “0” and the system identified friending dates for all friends, the system would identify friending dates as 1/1/970, and with some accounting for time zones, would see 46 years of friendship on December 31, 2015. It is presumed that the friend date would be entered as “0” if a friendship already existed prior to Facebook tracking anniversaries.

Errors associated with the Unix epoch are pretty common, but this appears to be the first time a bug like this has bitten Facebook. Presumably the error was quickly fixed, but we won’t know for sure until next December.