Ceiling Collapse in London’s 112-year-old Apollo Theatre Injures Dozens

By Kim Smiley

On December 19, 2013, 76 people were injured when a large section of plaster fell from the ceiling of London’s historic Apollo Theatre.  Luckily there were no fatalities as a result, but six people were seriously injured in the accident.

The investigation is still underway, but an initial Cause Map can be built to begin analyzing the incident.  The first step in the Cause Mapping process is to fill in an Outline with the basic background information as well as formally list how the incident impacts the goals so that no part of a multifaceted problem is neglected.  It’s important to understand how an issue impacts all goals, such has safety issues, financial considerations, schedule delays, etc. There are times when different solutions can help mitigate risks to separate goals so it is useful to list all impacted goals for clarity.   Listing the impacted goals will also help focus the investigation on the most important elements.

Another very important part of the Outline is a space where any relevant differences are listed.  Anything that was different at the time an incident occurred is usually a good place to start digging during an investigation.  For this example, there was heavy rain during the hour preceding the ceiling collapse.  It’s also worth noting that the Apollo Theatre is 112 years old.

Investigators have not announced what led to the ceiling collapse, but early speculation is that rain water leaked through the roof and settled onto the plaster.  The theory is that the additional weight from the water was more than the ceiling could handle and it fell, taking a lighting rig and part of a balcony with it.   If this was the case, there will need to be hard questions asked about the adequacy of current building codes and inspection requirements.  Currently, the roof on the Apollo Theatre was only required to be inspected every 3 years.  It appears that the Theatre was up to date on and had passed all required inspections so the required periodicity may need to be re-evaluated in light of the recent failure.

Any suspected causes that haven’t been proven yet can be included on the Cause Map, but are marked with a “?” to indicated that they need additional evidence.  This helps document what has been considered during an investigation and questions that still need to be answered.

To view an Outline and the initial Cause Map of the Apollo Theatre ceiling collapse, click on “Download PDF” above.


Department of Energy Cyber Breach Affects Thousands, Costs Millions

By ThinkReliability  Staff

Personally identifiable information (PII), including social security numbers (SSNs) and banking information, for more than 104,000 individuals currently or formerly employed by the Department of Energy (DOE) was accessed by hackers from the Department’s Employee Data Repository database (DOEInfo) through the Department’s Management Information System (MIS).  A review by the DOE’s  Inspector General in a recently released special report analyzes the causes of the breach and provides recommendations for preventing or mitigating future breaches.

The report notes that, “While we did not identify a single point of failure that led to the MIS/DOEInfo breach, the combination of the technical and managerial problems we observed set the stage for individuals with  malicious intent to access the system with what appeared to be relative ease.”  Because of the complex interactions between the systems, personnel interactions and safety precautions (or lack thereof) that led to system access by hackers, a diagram showing the cause-and-effect relationships can be helpful.  Here those relationships – and the impacts it had on the DOE and DOE personnel – are captured within a Cause Map, a form of visual root cause analysis.

In this case, the report uncovered concerns that other systems were at risk for compromise – and that a breach of those systems could impact public health and safety.  The loss of PII for hundreds of thousands of personnel can be considered an impact to the customer service goal.  The event (combined with two other cyber breaches since May 2011), has resulted in a loss of confidence in cyber security at the Department, an impact to the mission goal.  Affected employees were given 4 hours of authorized leave to deal with potential impacts from the breach, impacting both the production and labor goals.  (Labor costs for recovery and lost productivity are estimated to cost $2.1 million.)  The Department has paid for credit monitoring and established a call center for the affected individuals, at an additional cost of $1.6 million, leading to a cost of this event of $3.7 million.  With an average of one cyber breach a year for the past 3 years, the Department could be looking at multi-million dollar annual costs related to cyber breaches.

These impacts to the goals resulted from hackers gaining access to unencrypted PPI.  Hackers were able to gain access to the system, which was encrypted, and contained significant amounts of PPI, as this database was the central repository for current and former employees.  The PPI within the database included SSNs which were used for identifiers, though this is contrary to Federal guidance.  There appeared to have been no effort to remove SSNs as identifiers per a 5-year-old requirement for reasons that are unknown.  Reasons for the system remaining unencrypted appear to have been based on performance concerns, though these were not well documented or understood.

Hackers were able to “access the system with what appeared to be relative ease” because the system had inadequate security controls (only a user name and password were required for access), and could be directly accessed from the internet, presumably in order to accomplish necessary tasks.   In the report, ability to access the system was directly related to “continued operation with known vulnerabilities.”  This concept may be familiar to many at a time when most organizations are trying to do more with less.   Along with a perceived lack of authority to restrict operation, inability to address these vulnerabilities based on unclear responsibility for applying patches, and vulnerabilities that were unknown because of the limited development, testing, troubleshooting and ongoing scanning of the system, cost was also brought up as a potential issue for delay in addressing the vulnerabilities that contributed to the system breach.

According to the report, “The Department should have considered costs associated with mitigating a system breach … We noted the Department procured the updated version in March 2013 for approximately $4,200. That amount coupled with labor costs associated with testing and installing the upgrade were significantly less than the cost to mitigate the affected system, notify affected individuals of the compromise of PII and rebuild the Department’s reputation.”

The updated system referred to  was purchased in March 2013 though the system had not been updated since early 2011 and core support for the application upon which the system was built ended in July 2012.  Additionally, “the vulnerability exploited by the attacker was specifically identified by the vendor in January  2013.”  The update, though purchased in March,  was not installed until after the breach occurred.  Officials  stated that a decision to upgrade the system had not been made until December 2012, because it had not reached the end of its useful life.”  The Inspector General ‘s note about considering costs of mitigating a system breach is poignant, comparing the several thousand dollar cost of an on-time upgrade to a several million dollar cost of mitigating a breach.   However, like the DOE, many companies find themselves in the same situation, cutting costs on prevention and paying exponential higher costs to deal with the inevitable problem that will arise.

To view the Outline, Cause Map and recommended solutions based on the DOE Inspector General’s report, please click “Download PDF” above.  Or click here to read more.

Poisoned Dead Mice Parachuted onto Guam to Kill Snakes

By Kim Smiley

On December 1, 2013, 2,000 dead, poisoned neonatal mice were parachuted onto Guam on a unique mission to fight an invasive species, the brown tree snake. The parachutes are designed to catch in the trees and tempt the snakes, who live in the trees, into eating the mice. The mice are pumped full of acetaminophen, a chemical that the snakes are particularly sensitive to because it affects their blood’s ability to carry oxygen.

There are an estimated 2 million brown tree snakes on Guam so the 2,000 poisoned mice will only impact a very small percentage of the population, but scientists hope that the information they learn from this drop will help them plan larger mice drops in the future.  This is the fourth and largest dead mice drop so far and cost 8 million dollars.  Some of the mice were embedded with data-transmitting radios for this drop which will allow scientists to better gauge the effectiveness of this technique.

While the 8 million dollar price tag sounds high, it’s important to realize that the damage done by the brown tree snakes each year is significant.  Since their accidental introduction to the island, brown tree snakes have destroyed the native ecosystem, decimating the native bird population.  Brown tree snakes are also fantastic climbers and they routinely get into electrical equipment.  They cause an average of 80 power outages a year, resulting in costs as high as $4 million for repairs and lost productivity annually. (See our previous blog for more information.)

Even through the problem of the brown tree snakes is fairly well understood, an effective solution has been difficult to find.  There have been a number of different things tried over the years: snake traps, snake-sniffing dogs and snake-hunting inspectors have all been used, but the snakes have completely over un the island.  As farfetched as it sounds, parachuting dead mice seems to be the most promising solution at present.  It works because the snakes are very sensitive to acetaminophen; they only need to ingest about one-sixth of a standard pill for it to be effective.  This means that non-target animals are unlikely to be heavily impacted by the mice drops.  A pig or dog would need to eat around 500 of the baited mice for the dose to be lethal. One of the concerns is that snakes tend to avoid prey that is already dead, but information from the radio transmitters used in the recent drop should confirm if the mice are an effective bait.

One thing I know for sure, I would have loved to be in the brainstorm meeting the first time someone suggested parachuting dead mice.  This example is a good reminder to all of us to keep an open mind.  Every now and then, the most bizarre solution suggested turns out to be the best.

Metro Train Derails in the Bronx, Killing 4 and Injuring More Than 60

By Kim Smiley

Four passengers were killed and dozens more sent to the hospital after a metro train derailed in the Bronx early Sunday, December 1, 2013.  At the time of the accident, the train was carrying about 150 passengers and was traveling to Grand Central Terminal in New York City. The aftermath of the accident was horrific with all seven cars of the commuter train derailing. Metro-North has been operating for more than 30 years and this was the first accident that resulted in passenger deaths.

A Cause Map, or visual root cause analysis, can be built to help analyze this accident.  There is still a lot of investigative work that needs to be done to understand what caused the derailment, but the information that is available can be used to create an initial Cause Map.  The Cause Map can easily be expanded later to incorporate more information as it becomes available.  The first step when building a Cause Map is to fill in an Outline with the basic background information.  The impacts to the goals are also documented on the bottom of the Outline.  The impacted goals are then used to begin building the Cause Map.

In this example, the safety goal is clearly impacted because there were four fatalities and over 60 people injured.  The schedule goal is also significantly impacted because this portion of rail will be closed during most of the investigation.  The National Transportation Safety Board has estimated that the investigation will take 7 to 10 days.  The track closure is particularly impacting because this is a major artery into New York City with a ridership of 15.9 million in 2012.  Once the impacted goals are documented, the Cause Map itself is built by asking “why” questions.

So why did the train derail?  The details aren’t known yet, but there is still some information that should be documented on the Cause Map.  A question mark is included after a cause that may have contributed to an issue, but requires more evidence or investigation.  It’s useful to document these open questions during an investigation to ensure that all the pertinent questions are asked and nothing is overlooked.  (If it is determined that a cause didn’t play a role, it can be crossed out on the Cause Map to show that the cause was considered, but ruled out.)  Two factors that likely  played a role in the derailment are the speed of the train and the track design where the accident occurred.  There is a sharp curve in the track where the derailment happened.  Trains are required to reduce their speed before traveling it.  The latest reports from the investigation are that the train was traveling 82 mph in a 30 mph zone. The train operator has stated that the brakes malfunctioned and didn’t respond when he tried to reduce speed and that the train was traveling too fast over the curved track.

Investigators have recovered the data recorder from the train which will provide  more information and if there was a problem with the brakes.  Investigators will also interview all the relevant personnel and determine what happened to cause this deadly crash.  Once the investigation is completed, any necessary solutions can be implemented to reduce the risk that a similar accident occurs in the future.

To view a completed Outline and initial Cause Map of this incident, click on “Download PDF” above.