By ThinkReliability Staff
The Therac-25 is a radiation therapy machine used during the mid-80s. It delivered two types of radiation beams, a low-power electron beam and a high-power x-ray. This provided the economic advantage of delivering two kinds of therapeutic radiation with one machine. From June 1985 to January 1987, the Therac-25 delivered massive radiation overdoses to 6 people around the country. We can look at the causes of these overdoses in a root cause analysis performed as a Cause Map.
The radiation overdoses were caused by delivery of the high-powered electron beam without attenuation. In order for this to happen, the high-powered beam was delivered, and the attenuation was not present. The lower-powered beam did not require attenuation provided by the beam spreader, so it was possible to operate the machine without it. The machine did register an error when the high-powered beam was turned on without attenuation. However, it was possible to operate the the beam with the error and the warning was overridden by the operators.
The Therac-25 had two different responses to errors. One was to pause the treatment, which allowed the operators to resume without any changes to settings, and another was to reset the machine settings. The error resulting in this case, having the high-power beam without attenuation, resulted only in a treatment pause, allowing the operator to resume treatment with an override, without changing any of the settings. Researchers talking to the operators found that the Therac-25 frequently resulted in errors and so operators were accustomed to overriding them. In this case, the error that resulted (“Malfunction 54”) was ambiguous and not defined in any of the operating manuals. (This code was apparently only to be used for the manufacturing company, not healthcare users.)
The Therac-25 allowed the beam to be turned on without error (minus the overridden warning) in this circumstance. The Therac-25 had no hardware protective circuits and depended solely on software for protection. The safety analysis of the Therac-25 considered only hardware failures, not software errors, and thus did not discover the need for any sort of hardware protection. The reasoning given for not including software errors was the “extensive testing” of the Therac-25, the fact that software, unlike hardware, does not degrade, and the general assumption that software is error-proof. Software errors were assumed to be caused by hardware errors, and residual software errors were not included in the analysis.
Unfortunately the coding used in the Therac-25 was in part borrowed from a previous machine and contained a residual error. This error was not noticed in previous versions because hardware protective circuits prevented a similar error from occurring. The residual error was a software error known as a “race condition”. In short, the output of the coding was dependent on the order the variables were entered. If an operator were to enter the variables for the treatment very quickly and not in the normal order (such as going back to correct a mistake), the machine would accept the settings before the change from the default setting had registered. In some of these cases, it resulted in the error described here. This error was not caught before the overdoses happened because software failures were not considered in the safety analysis (as described above), the code was reused from a previous system that had hardware interlocks (and so had not had these problems) and the review of the software was inadequate. The coding was not independently reviewed, the design of the software did not include failure modes and the software was not tested with the hardware until installation.
This incident can teach us a lot about over-reliance on one part of a system and re-using designs in a new way with inadequate testing and verification (as well as many other issues). If we can learn from the mistakes of others, we are less likely to make those mistakes ourselves. For more detail on this (extremely complicated) issue, please see Nancy Leverson and Clark Turner’s “An Investigation of the Therac-25 Incidents.”