Reading 05

The Therac-25 accidents were a horrible tragedy that brought to light issues within engineering and programming culture. The Therac-25 machine was a medical device that was used to help combat cancer. It would blast the patients with radiation beams that were safely reflected off of mirrors and other safety mechanisms. According to the article, “it was determined that the root cause of the problem was twofold. Firstly, the software controlling the machine contained bugs which proved to be fatal. Secondly, the design of the machine relied on the controlling computer alone for safety. There were no hardware interlocks or supervisory circuits to ensure that software bugs couldn’t result in catastrophic failures.” Essentially, safety mechanisms within the hardware were overwritten when the code was updated to be more software based. If the user selected X-ray mode, the machine would begin setting up the machine for high-powered X-rays. This process took about 8 seconds. If the user switched to Electron mode within those 8 seconds, the turntable would not switch over to the correct position, leaving the turntable in an unknown state. Overall, the entire system design was the problem. Safety-critical loads were placed upon a computer system that was not designed to control them. Timing analysis wasn’t performed. Unit testing never happened. Fault trees for both hardware and software were not created.

I think the challenges for software developers are mainly they have to try and think of every possible way for their code to go wrong and put safety measures in place. In my mind I think there should be overall fail safes and such that are normally in place and if there are unforeseen catastrophic failures that would not seem “normal” that the developers cannot be held liable. In my mind there is no perfect machine that will run in every possible way it could be used and it’s not fair to blame the engineers. For example, in the Therac-25 case the basics of the software and hardware should have been able to prevent these issues but by quickly switching between modes it caused an error that would not have been foreseeable. Granted, I do not have any real developer experience and do not understand the inner workings of a software development firm I do not see how it could be possible to write code that would prevent every error possible.

Overall, I think safety critical systems should be overly redundant to prevent issues and have multiple safe guards but it is also reasonable to understand that people will not always use software and hardware in its intended way and we therefore can’t hold the developers responsible.

Reading 05

Leave a comment