#TBT – A look at our best blog posts #2

__It week two in our four week series from the archives – bringing you the best of our blog posts.  

No doubt Raygun.com is saving thousands of developers from embarrassing or even catastrophic software errors every day, but what was life like without such an awesome (and automatic) error tracking solution? We’ve looked into some of the biggest disasters over the years to see what happens when software errors cause chaos!

NASA’s Mars Climate Orbiter – On its mission to Mars in 1998 the Climate Orbiter spacecraft was ultimately lost in space. Although the failure bemused engineers for some time it was revealed that a sub contractor on the engineering team failed to make a simple conversion from English units to metric. An embarrassing lapse that sent the $125 million craft fatally close to Mars’ surface after attempting to stablize its orbit too low. Flight controllers believe the spacecraft ploughed into Mars’ atmosphere where the associated stresses crippled its communications, leaving it hurtling on through space in an orbit around the sun.

Ariane 5 Flight 501 – Europe’s newest un-manned satellite-launching rocket reused working software from its predecessor, the Ariane 4. Unfortunately, the Ariane 5’s faster engines exploited a bug that was not found in previous models. Thirty-six seconds into its maiden launch the rocket’s engineers hit the self destruct button following multiple computer failures. In essence, the software had tried to cram a 64-bit number into a 16-bit space. The resulting overflow conditions crashed both the primary and backup computers (which were both running the exact same software). The Ariane 5 had cost nearly $8 billion to develop, and was carrying a $500 million satellite payload when it exploded. In the video below you can see the engineer struggle to comprehend what he’s just seen on his screen as the rocket explodes before calmly writing down a short note, probably the letters F…A…I….L.

EDS Child Support SystemIn 2004, EDS introduced a highly complex IT system to the U.K.’s Child Support Agency (CSA). At the exact same time, the Department for Work and Pensions (DWP) decided to restructure the entire agency. The two pieces of software were completely incompatible, and irreversible errors were introduced as a result. The system somehow managed to overpay 1.9 million people, underpay another 700,000, had $7 billion in uncollected child support payments, a backlog of 239,000 cases, 36,000 new cases “stuck” in the system, and has cost the UK taxpayers over $1 billion to date.

Soviet Gas Pipeline Explosion – The Soviet pipeline had a level of complexity that would require advanced automated control software. The CIA was tipped off to the Soviet intentions to steal the control system’s plans. Working with the Canadian firm that designed the pipeline control software, the CIA had the designers deliberately create flaws in the programming so that the Soviets would receive a compromised program. It is claimed that in June 1982, flaws in the stolen software led to a massive explosion along part of the pipeline, causing the largest non-nuclear explosion in the planet’s history.

Heathrow Terminal 5 Opening Just before the opening of Heathrow’s Terminal 5 in the UK, staff tested the brand new baggage handling system built to carry the vast amounts of luggage checked in each day. Engineers tested the system thoroughly before opening the Terminal to the public with over 12,000 test pieces of luggage. It worked flawlessly on all test runs only to find on the Terminal’s opening day the system simply could not cope. It is thought that ‘real life’ scenarios such as removing a bag from the system manually when a passenger had left an important item in their luggage, had caused the entire system to become confused and shut down. Over the following 10 days some 42,000 bags failed to travel with their owners, and over 500 flights were cancelled.

The Mariner 1 Spacecraft – On a mission to fly-by Venus in 1962, this spaceship barely made it out of Cape Canaveral when a software-coding error caused the rocket to veer dangerously off-course, threatening to crash back to earth. Alarmed, NASA engineers on the ground issued a self-destruct command. A review board later determined that the omission of a hyphen in coded computer instructions allowed the transmission of incorrect guidance signals to the spacecraft. The cost for the rocket was reportedly more than $18 million at the time.

The Morris Worm – A program developed by a Cornell University student for what he said was supposed to be a harmless experiment wound up spreading wildly and crashing thousands of computers in 1988 because of a coding error. It was the first widespread worm attack on the fledgling Internet. The graduate student, Robert Tappan Morris, was convicted of a criminal hacking offense and fined $10,000. Morris’s lawyer claimed at the trial that his client’s program helped improve computer security. Costs for cleaning up the mess may have gone as high as $100 Million. Morris, who interestingly co-founded the startup incubator Y Combinator, is now a professor at the Massachusetts Institute of Technology. A disk with the worm’s source code is now housed at the University of Boston.


Patriot Missile Error – Sometimes, the cost of a software glitch can’t be measured in dollars. In February of 1991, a U.S. Patriot missile defence system in Saudi Arabia, failed to detect an attack on an Army barracks. A government report found that a software problem led to an “inaccurate tracking calculation that became worse the longer the system operated.” On the day of the incident, the system had been operating for more than 100 hours, and the inaccuracy was serious enough to cause the system to look in the wrong place for the incoming missile. The attack killed 28 American soldiers. Prior to the incident, Army officials had fixed the software to improve the Patriot system’s accuracy. That modified software reached the base the day after the attack.

Pentium FDIV bug – When a math professor discovered and publicized a flaw in Intel’s popular Pentium processor in 1994, the company’s response was to replace chips upon request to users who could prove they were affected. Intel calculated that the error caused by the flaw would happen so rarely that the vast majority of users wouldn’t notice. Angry customers demanded a replacement for anyone who asked, and Intel agreed. The episode cost Intel $475 million.

Knight’s $440 Million Error – One of the biggest American market makers for stocks struggled to stay afloat after a software bug triggered a $440 million loss in just 30 minutes. The firm’s shares lost 75 percent in two days after the faulty software flooded the market with unintended trades. One of Knight’s trading algorithms reportedly started pushing erratic trades through on nearly 150 different stocks, sending them into spasms.

Honourable mention: NOAA-19 Satellite – Although not a software error, on September 6, 2003, the satellite was badly damaged while being worked on at the Lockheed Martin Space Systems factory. The satellite fell to the floor as a team was turning it to a horizontal position. An inquiry into the mishap determined that it was caused by a lack of procedural discipline throughout the facility. While the turn-over cart used during the procedure was in storage, a technician removed twenty-four bolts securing an adapter plate to it without documenting the action. The team subsequently using the cart to turn the satellite failed to check the bolts, as specified in the procedure, before attempting to move the satellite. Repairs to the satellite cost $135 million.

NASA Satellite fall over NOAA-19

Don’t want to get caught out by your software bugs? Get automatically notified of your software’s errors with instant notifications. Start a FREE no obligation trial today with Raygun.