Tag Archives: software

Software Glitch Delays U.S. Travel Documents

By Kim Smiley

The Consular Consolidated Database (CCD) is the global database used by the U.S. State Department to process visas and other travel documents.  On July 20, 2014, the CCD experienced software issues and had to be taken offline.  The outage lasted several days with the CCD being returned to service with limited capacity on July 23.  The CCD is huge, one of the largest Oracle-based warehouses in the world, and is used to process a hefty number of visas each year and the effects of the software glitch have been felt worldwide.  The State Department processed over 9 million immigrant and non-immigrant visas overseas in 2013 so a delay of even a few days means a significant backlog.

This issue can be analyzed by building a Cause Map, a visual root cause analysis.  A Cause Map visually lays out the different causes that contribute to an issue so that the problem is better understood and a wider range of solutions can be considered.  The first step in the Cause Mapping process is to define the problem, which includes documenting the overall impacts to the goal.  Most problems impact more than one goal and this example is no exception.

The customer service goal is clearly impacted because thousands – and potentially even millions – have had their travel document processing delayed.  The negative publicity can also be considered an impact to the customer service goal because this software glitch isn’t doing the international image of the U.S. any favors.  The delay in travel document services is an impact to the production/schedule goal and the recovery effort and investigation into the problems impact the labor/time goal.  Additionally, there are potential economic impacts to both individuals who may have had to change travel plans and to the U.S. economy because these issues may discourage international tourism.

The next step in the Cause Mapping method is to build the Cause Map.  This is done by asking “why” questions and using the answer to visually lay out the cause-and-effect relationships.  The delay in processing travel documents occurred because the CCD is needed to process them and the CCD had to be taken offline as a result of software issues.  Why were there issues with the database? Maintenance was done on the CCD on July 20 and the performance issues began shortly thereafter.  The maintenance was done to improve system performance and to fix previous intermittent performance issues. The State Department has stated that this was not a terrorist act or anything more malicious than a software glitch.  An investigation is currently underway to determine exactly what caused the software glitch, but the details have not been released at this time.  It can be assumed that the test program for the software was inadequate since the glitch wasn’t identified prior to implementation.

The final step in the Cause Mapping process is to identify solutions that can be implemented to reduce the risk of a problem recurring.  Details of exactly what was done to deal with the issue in the short term and bring the CCD back online aren’t available, but the State Department has stated that additional servers were added to increase capacity and improve response time.  There is also a plan to improve the CCD in the longer term by upgrading to a newer version of the Oracle database software by the end of the year which will hopefully prove more stable.

To view an Outline and high level Cause Map of this issue, click on “Download PDF” above.

Trading Glitch Loses Goldman Sachs Millions

By Kim Smiley

A Goldman Sachs trading glitch on August 20, 2013 caused a large number of erroneous single stock and ETF options trades.  About 80 percent of the errant trades were cancelled, but the financial damage is still speculated to be as much as one hundred million dollars. The company also finds itself once again in the uncomfortable position of making headlines for negative reasons which is never good for business.

The glitch occurred during an update to an internal computer system that is used to determine where to price options.  The update changed the software so that the system began inadvertently misinterpreted non-binding indications of interest as actual bids and offers.  The system acted on these bids and executed a large volume of trades at errant prices that were out of touch with actual market prices.

This issue can be built into a Cause Map, an intuitive method for performing a root cause analysis.  One of the advantages of a Cause Map is that it visually lays out all the causes and the cause-and-effect relationships between them. Seeing all the causes can broaden the solutions that are considered.

In this example, a Cause Map can help illustrate the fact that the software glitch itself isn’t the only thing worth focusing on.  The lack of an effective test program also contributed to the problem and testing may be the easiest place to implement an effective solution.  If the problem would have been caught in testing, the only cost would have been the time and effort needed to fix the software.  The importance of a robust test program for software is difficult to overstate.  If the software is vital to whatever your company’s mission is, develop a way to test it.

To view a high level Cause Map of this issue, click on “Download PDF” above.  Click here to read about the loss of the Mars Climate Orbiter, another excellent example of a software error with huge consequences.