Tag Archives: problem description

Volkswagen admits to use of a ‘defeat device’

By Kim Smiley

The automotive industry was recently rocked by Volkswagen’s acknowledgement that the company knowingly cheated on emissions testing of several models of 4-cylinder diesel cars starting in 2009.  The diesel cars in question include software “defeat devices” that turn on full emissions control only during emissions testing.  Full emissions control is not activated during normal driving conditions and the cars have been shown to emit as much as 40 times the allowable pollution.   Customers are understandably outraged, especially since many of them purchased a “clean diesel” car in an effort to be greener.

The investigation into this issue is ongoing and many details aren’t known yet, but an initial Cause Map, a visual format for performing a root cause analysis, can be created to document and analyze what is known.  The first step in the Cause Mapping process is to fill in a Problem Outline with the basic background information and how the issue impacts the overall organizational goals.  The “defeat device” issue is a complex problem and impacts many different organizational goals.  The increased emissions obviously impacts the environmental goal and the potential health impacts of those emissions is an impact to the safety goal.  Some of the specific details are still unknown, like the exact amount of the fines the company will face, but we can safely assume the company will be paying significant fines (on the order of billions) as a result of this blatant violation of the law.  The Volkswagen stock price also took a major hit and dropped more than 20 percent following the announcement of the diesel emissions issues.  It is difficult to quantify how much the loss of consumer confidence will impact the company long-term, but being perceived as a dishonest company by many will certainly impact their sales.   A large recall that will be both time-consuming and costly is also in Volkswagen’s future.  Depending on the investigation findings, there is also the potential for criminal prosecution because of the intentional nature of this issue.

Once the overall impacts to the goals are defined, the actual Cause Map can be built by asking “why” questions.  So why did these cars include “defeat devices” to cheat on emissions tests?  The simple answer is increased profits.  Designing cars that appeared to have much lower emissions than they did in reality allowed Volkswagen to market a car that was more desirable. Car design has always included a trade-off between emissions and performance.  Detailed information hasn’t been released yet, but it is likely that the car had improved fuel economy and improved driving performance during normal driving conditions when full emissions control wasn’t activated. Whoever was involved in the design of the “defeat device” also likely assumed the deception would never be discovered, which raises concern about how emissions testing is performed.

The design of the “defeat device” is believed to work by taking advantage of unique conditions that exist during emissions testing. During normal driving, the steering column moves as the driver steers the car, but during emissions testing the wheels rotate, but the steering column doesn’t move.  The “defeat device” software appears to have monitored the steering column and wheels to sense when the conditions indicated an emissions test was occurring.  When the wheels turned without corresponding steering wheel motion, the software turned the catalytic scrubber up to full power, reducing emissions and allowing the car to pass emissions tests. Details on how the “defeat device” was developed and approved for inclusion in the design haven’t been released, but hopefully the investigation into this issue will be insightful and help understand exactly how something this over the line occurred.

Only time will tell exactly how this issue impacts the overall health of the Volkswagen company, but the short-term effects are likely to be severe.  This issue may also have long-reaching impacts on the diesel market as consumer confidence in the technology is shaken.

To view an Outline and initial Cause Map of this issue, click on “Download PDF” above.

When You Call Yourself ThinkReliability…

By ThinkReliability Staff

While I was bombasting about the Valdez oil spill in 1989, one of those ubiquitous internet fairies decided that I did not really need the network connection at my remote office.  Sadly this meant that the attendees on my Webinar had to listen only to me speaking without seeing the pretty diagrams I made for the occasion (after a short delay to switch audio mode).

Though I have all sorts of redundancies built in to Webinar presentations (seriously, I use a checklist every time), I have not prepared for the complete loss of network access, which is what happened during my March 20th, 2014 Webinar.  I’m not going to use the term “root cause”, because I still had another plan . . . (yep, that failed, too).

For our mutual amusement (and because I get asked for this all the time), here is a Cause Map, or visual root cause analysis – the very method I was demonstrating during the failure – of what happened.

First we start with the what, when and where.  No who because blame isn’t the point, though in this case I will provide full disclosure and clarify that I am, in fact, writing about myself.  The Webinar in question was presented on March 20, 2014 at 2:00 PM EST (although to my great relief the issues didn’t start until around 2:30 pm).  That little thorn in my side? It was the loss of a network connection at the Wisconsin remote office (where I typically present from).  I was using Citrix Online’s GoToWebinar© program to present a root cause analysis case study of the Valdez oil spill online.

Next we capture the impact to the organization’s (in this case, ThinkReliability) goals.  Luckily, in the grand scheme of things, the impacted goals were pretty minor.  I annoyed a bunch of customers who didn’t get to see my slides and I scheduled an additional Webinar.  Also I spent some time doing follow-up to those who were impacted, scheduling another Webinar, and writing this blog.

Next we start with the impacted goals and ask “Why” questions.  The customer service goal was impacted because of the interruption in the Webinar.  GoToWebinar© (as well as other online meeting programs) has two parts: audio and visual.  I temporarily lost audio as I was using the online option (VOIP), which I use as a default because I like my USB headset better than my wireless headset.  The other option is to dial in using the phone.  As soon as I figured out I had lost audio, I switched to phone and was able to maintain the audio connection until the end of the Webinar (and after, for those lucky enough to hear me venting my frustration at my office assistant).

In addition to losing audio, I lost the visual screen-sharing portion of the Webinar.   Unlike audio, there’s only one option for this.  Screen sharing occurs through an online connection to GoToWebinar©.  Loss of that connection means there’s a problem with the GoToWebinar© program, or my network connection.  (I’ve had really good luck with GoToWebinar; over the last 5 years I have used the program at least weekly with only two connection problems attributed to Citrix.)  At this point I started running through my troubleshooting checklist.  I was able to reconnect to audio, so it seemed the problem was not with GoToWebinar©.  I immediately changed from my wired router connection to wireless, which didn’t help.  Meanwhile my office assistant checked the router and determined that the router was not connected to the network.

You will quickly see that at this point I reached the end of my expertise.  I had my assistant restart the router, which didn’t work, at least not immediately.  At this point, my short-term connection attempts (“immediate solutions”) were over.  Router troubleshooting (beyond the restart) or a call to my internet provider were going to take far longer than I had on the Webinar.

Normally there would have been one other possibility to save the day.  For online presentations, I typically have other staff members online to assist with questions and connection issues, who have access to the slides I’m presenting.  That presenter (and we have done this before) could take over the screen sharing while I continued the audio presentation.  However, the main office in Houston was unusually short-staffed last week (which is to say most everyone was out visiting cool companies in exciting places).  And (yes, this was the wound that this issue rubbed salt in), I had been out sick until just prior to the Webinar.  I didn’t do my usual coordination of ensuring I had someone online as my backup.

Because my careful plans failed me so completely, I scheduled another Webinar on the same topic.  (Click the graphic below to register.)  I’ll have another staff member (at another location) ready online to take over the presentation should I experience another catastrophic failure (or a power outage, which did not occur last week but would also result in complete network loss to my location).   Also, as was suggested by an affected attendee, I’ll send out the slides ahead of time.  That way, even if this exact series of unfortunate events should recur, at least everyone can look at the slides while I keep talking.

To view my comprehensive analysis of a presentation that didn’t quite go as planned, please click “Download PDF above.  To view one of our presentations that will be “protected” by my new redundancy plans, please see our upcoming Webinar schedule.

Greece Economic Woes – Part 1

By ThinkReliability Staff

Greece is currently suffering from an economic crisis.  Leaders in Greece, the European Union, and the rest of the world are all anxiously watching as events unfold to attempt to minimize the impact of these issues.  An analysis of this issue can help these leaders minimize their own impacts, as well as provide appropriate aid to Greece.  However, performing an root cause analysis on an issue whose roots reach back years is not an easy task.

Normally a root cause analysis performed as a Cause Map begins with a problem outline.  However, sometimes an issue is so complicated that it’s difficult to begin there.  In these kinds of cases, beginning with the creation of a timeline may aid in the investigation.

What to include in the timeline is a frequently asked question.  When beginning a timeline, put in all the information you have.  It may make sense to go back later and create a less detailed timeline.  However, many events that don’t initially seem to add much to the timeline may later turn out to be important in the analysis.  In the case of Greece, I began the timeline with Greece’s entry into the European Union (EU).  While it wasn’t clear initially whether this contributed to the current issues being faced by Greece, it later became clear that the restrictions placed on EU-member countries did in fact contribute to the current issues.

Events in the timeline may turn out to be impacted goals.  For example, at various points in the timeline Greece’s credit rating has been downgraded.  The last downgrade occurred just before default by Moody’s.  Having a solid credit rating is an important goal – so a downgraded credit rating, especially one as low as Greece’s, is an impact to the financial goal of that country.

Once the timeline has begun (it’s not really complete until the issue is considered resolved, which in this case will take years), the next step would be to tackle the outline.  Writing the timeline will hopefully have provided some clarity to the issue.  For example, since Greece entered recession in 2009, we can choose 2009-2011 as a logical time to enter in the outline.  If more detail is desired, referring to the timeline is also appropriate.

The most commonly asked question about the outline is what to write in the “differences” row.  Differences are meant to capture things that may have been out of the ordinary, or potentially answer the question “why this country (or equipment or time) as opposed to some other country?”  Because Greece is a part of the European Union, which has consistent financial goals for its members, we can use some data points that show how Greece differs from other countries in the EU, or essentially answer the question “why is Greece having these issues instead of the other EU countries?”  In Greece, debt is estimated to be 150% of the Gross Domestic Product (GDP).  This is much higher than for most other nations.  The public sector in Greece accounts for about 40% of the GDP, also higher than typical.  Greece has the second lowest Index of Economic Freedom in the EU, which impacts its ability to quickly adjust to economic changes.   Greece economic statistics were (significantly)   misreported, contributing to the rapid decline in stability.  And, Greek tax evasion is estimated at 13B Euros a year.  This is likely not a full list of the differences between Greece and other EU countries, but it’s a start  and the outline can continue to evolve as more information is provided on the issue.

Once the top portion of the outline is complete, the impacts to the goals can be addressed.  Again, many of these impacts can be pulled from the timeline.  There were some citizen deaths associated with rioting as a result of proposed economic policies, which is an impact to the safety goal.  Spending cuts and tax increases impact the customer service goal (in this case, the “customers” are the citizens of Greece).  The production goal is impacted because of high (above 16%) unemployment, and the financial goals are impacted by a debt rating just above default and a 110B euro default.  Last but not least, there is the potential for impact on the European Union if the crisis spreads beyond Greece.

As you’ve noticed, no real analysis has yet taken place.  We’ll look at some of the causes contributing to the      current issues in Greece in an upcoming blog.  Click on “Download PDF” above to view the timeline and outline

Is a College Education Worth the Price?

By Kim Smiley

Most students go to college hoping it will further their education and allow them better career opportunities upon graduation.  But is the investment of time and money required to get a college education worth it?

The cost of college has been rapidly increasing over the last several years.  At the same time, many company executives have been noting that today’s students do not graduate college with the critical thinking skills necessary to succeed.  A new book, “Academically Adrift: Limited Learning on College Campuses,” by sociologists Richard Arum of New York University and Josipa Roksa of the University of Virginia publishes findings of a study that says that students aren’t improving much in the areas of “critical thinking, complex reasoning and writing” during their four years in college.

The study based its results on assessment scores taken by 2,300 students as they entered college, after two years, and after four years.  After two years, 45% of students showed insignificant improvement and after four years, 36% showed insignificant improvement.  The study also found that very little reading and writing is required in many college courses.

The findings indicate that students aren’t being adequately prepared for their future careers.  How do we solve this problem?  Similar to engineering problems, a root cause analysis could be performed to help understand and hopefully solve this problem.  The more clearly a problem is understood, the easier it is to develop and implement solutions.  There are some potential solutions that have been suggested already, but only time will tell if they are successful.

Many institutions of higher learning are working to combat the issue.  More than 70 college and university presidents have pledged to take steps to improve instruction and student learning, and make those results public.  Hopefully the colleges and universities that have pledged to use evidence-based solutions to improve learning will pave the way for all colleges and universities increasing the critical thinking and writing skills of all college graduates.

There are also a number of things that students can do to improve their own learning.  The study found that students who study alone (as opposed to in study groups) are more likely to post gains over college.  Additionally, students who choose to read and write more, and attend more selective schools that focus on teaching rather than research tend to improve their critical thinking and writing skills over their years at college.

Everyone should agree that a large percent of students graduating from college showing little or no improvement in critical reasoning and writing skills is not a desirable outcome – i.e. a problem.  There are many ways to improve the situation.  Some of these solutions must be implemented by the universities themselves, but students can take many actions themselves to increase their learning over their college years.

Click here to read more about this topic.

Printing Issues with New $100 Bill

By ThinkReliability Staff

In October, the U.S. government discovered that some of the newly redesigned $100 bills were coming off the printing press with blank spots caused by creases in the paper at both sites of the Bureau of Engraving and Printing, Washington, D.C. and Fort Worth, Texas.  The government has recently announced that this will cause a delay in the introduction of these bills, planned for the spring of 2011.

Additionally, the bills that have blank spots will have to be  shredded and reprinted.  Because of complex new security features aimed at deterring counterfeiters (such as a 3-D security strip woven into the paper), the bills cost $0.12 to print.  Hundreds of millions of bills have been printed, with a possible cost of this issue in the millions of dollars.

 Although issues with currency are expensive, they’re also rare. The last time that a printing issue caused a delay in the introduction of a new bill was 1987.  It’s unclear at this point when the bills will finally be released.

It’s also unclear what happened to cause the paper to crease, creating blank spots from printing.  The additional complexity of this bill with the additional security features is being looked at, as are issues with the paper and the printing machines.  However, because similar errors occurred at both printing sites, it’s unlikely that there is a specific issue with just one site’s machines.  Although the investigation into what caused the blank spots is ongoing, we can begin a root cause analysis with what is currently known.  Once more information is discovered, the Cause Map can be updated.

Because of the high potential financial losses from this issue, the eventual investigation will likely go into great  detail and to determine fully what happened will take some time.  The Cause Map and outline for the information known now can be viewed by clicking “Download PDF” above.

Washing Machine Failure

(This week, we are proud to announce a Cause Map by a guest blogger, Bill Graham.  Thanks, Bill!)

While completing household chores in the spring of 2010, a Housewife found her front load washing machine stopped with water standing in the clothing.  Inspection of the machine uncovered the washing machine’s drain pump had failed.  Because the washer is less than two years old, it was decided to attempt repair of the machine instead of replacing it.  A replacement pump was not locally available, so the family finds and orders a pump from an Internet dealer.  Delivery time for the pump is approximately one week, during which time the household laundry chore cannot be completed and some of the family’s favorite clothing cannot be worn because it is has not been laundered.  On receiving the new pump, Dad immediately removes the broken pump and finds, to his chagrin, a small, thin guitar pick in the suction of the old pump.  Upon discovery of the guitar pick, the family’s children report that the pick had been left in the pocket of the pants that where being washed at the time of the pump’s failure.  The new pump was installed and the laundry chore resumed for the household.

While most cause analysis programs would identify the guitar pick as the root cause to the washing machine’s failure, Cause Mapping unveils all of the event’s contributing factors and what most efficient / cost effective measures might be taken to avert a similar failure.  For example, if all the family’s children aspire to be guitar players, then a top load washer may better suit their lifestyle while also averting the same mishap.  Or, maybe the family should consider wearing pocket-less clothing.  Or, maybe all family members should assume bigger role in completing the household laundry chore.  Whichever solution is chosen, the impact of these and all contributing causes is easily understood when the event is Cause Mapped.

Spacewalk Delay for Ammonia Leak

By Kim Smiley

Astronauts at the International Space Station ran into problems during a planned replacement of a broken ammonia cooling pump on August 7, 2010.  In order to replace the pump, four ammonia hoses and five electrical cables needed to be disconnected to remove the broken pump.  One of the hoses could not be removed because of a jammed fitting.  When an astronaut was able to disconnect it by hitting the fitting with a hammer, it caused an ammonia leak.

Ammonia is toxic, so the leak impacted both the safety and environmental goals.  Because the broken pump kept one cooling system from working, there was a risk of having to evacuate the space station, should the other system (which was the same age) fail.  This can be considered an impact to the customer service goal.   The repair had to be delayed, which is an impact to the production/schedule goal.  The loss of a redundant system is an impact to the property/equipment goal.     The extended spacewalk is an impact to the labor/time goal.

Once we fill out the outline with the impact to the goals and information regarding the problem, we can go on to the Cause Map.   The ammonia leak was caused by an unknown leak path and the fitting being removed by a hammer.  The fitting was removed with a hammer because it was jammed and had to be disconnected in order for the broken pump to be replaced.  As we’re not aware of what caused the pump to break (this information will likely be discovered now that the pump has been removed), we leave a question mark on the map, to fill in later.

The failed cooling pump also caused the loss of one cooling system.  If the other system, which is near the end of its expected life, were to fail, this would require evacuation from the station.

To aid in our understanding of this incident, we can create a very simple process map of the pump replacement.  The red firework shows the step in the replacement that didn’t go well.  To view the outline, Cause Map and Process Map, click on “Download PDF” above.

Airlink Incidents: Viewing Trends in Visual Form

By ThinkReliability Staff

Over the past three months, South Africa’s Airlink airline has had four incidents, ranging from embarrassing to fatal. Four similar incidents such as these start to point out a trend, which should be investigated to improve processes and increase safety. But how do we start the investigation?

In the Cause Mapping root cause analysis method, we begin by defining the problem. Here we can define four problems, which are the four incidents over the last three months. We can look at one incident at a time in a problem outline, the first step of the Cause Mapping process. We’ll start with the earliest incident first.

On September 24, 2009 at approximately 8 a.m. a Jetstream 41 crashed into a school yard in Durban Bluff just after take-off from Durban International Airport. This was a forced landing necessitated by the loss of an engine. The pilot was killed. There were also two serious injuries of the crew, and a minor injury of a person on the ground. There were no passengers on the plane, and the impact to Airlink’s schedule is unclear. However, the plane was lost.

We can capture this information more clearly and succinctly in an outline. For example, the above paragraph has more than 80 words. The outline, which records the same information, uses only 42 words in an easily understandable visual form. (The outline for all three incidents can be viewed by clicking on “Download PDF” above.)

The second incident: On November 18, 2009 at 1:30 p.m. a BAE Systems Jetstream 41 aborted take-off for East London and slid off the runway at Port Elizabeth airport. There were high velocity cross winds, and the pilot may have been unable to establish directional control. There were no injuries, no environmental impact and damages to the plane are unknown. However, new travel arrangements had to be made by the airline for all the passengers. The frequency of Airlink incidents is now two in eight weeks. (Over 80 words; the outline has 49 words.)

The third incident: On November 24, 2009 at approximately 8 a.m. a flight en route to Harare carrying a Prime Minister was forced to return to Johannesburg Airport after it experienced a technical fault. There were no injuries, but it caused a delay in the Prime Minister’s schedule. The damage to the airplane is unclear. The frequency of Airlink incidents is now three in two months. (Over 60 words; the outline has 33 words.)

The fourth incident: On December 7, 2009 at approximately 11 a.m. a Regional airline SA Airlink Embraer 135 commuter jet hydroplaned and overshot the runway while landing at George Airport during rainy weather. There were five injuries, including a sprained ankle. This incident has led to a poor public perception of the airline and increased supervision from the authorities. We do not have a dollar amount on the property damage. The frequency of Airlink incidents is now 4 in 10 weeks. (Over 70 words; the outline has 42 words.)

In addition to the increased brevity of the outline, it provides an easy visual comparison of the four incidents by showing them in a similar visual form. On one page, we can show the timeline, and outlines of the four incidents for easy comparison. This is especially useful for a briefing tool for busy managers.