Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.

sections

AWS Australia outage worsened by software bug, company confirms

  • Print
  • Share
  • Comment
  • Save

Code error and backup power failure combined to create a perfect storm

A software bug and a failure in AWS’ emergency backup power supply were responsible for extending the major service outage on Sunday, the company revealed.

Heavy rainfall and gusting winds of up to 96 km/h knocked out the supply of power from a utility provider, requiring for emergency provision of energy. AWS has two backup power systems to deliver emergency supply but some server instances saw both of these fail, according to the Register.

thinkstock craig jewell Australia Sydney

Severe weather conditions on June 5 caused power outages in south east Australia

Source: Thinkstock/Craig Jewell

While the company worked to restore the instances that had been knocked out of action, engineers discovered “a latent bug” in the company’s instance management software. A minority of instances had to be restored to working condition manually, meaning they were not fully operational until later on Monday.

Availability of data was also disrupted in some instances where dead disks required manual repair.

Interrupted power supply

AWS’ diesel rotary uninterruptable power supply (DRUPS), which integrates a diesel generator and a mechanical UPS, would under such circumstances usually fill the energy supply deficit.

“Under normal operation, the DRUPS uses utility power to spin a flywheel which stores energy. If utility power is interrupted, the DRUPS uses this stored energy to continue to provide power to the data center while the integrated generator is turned on to continue to provide power until utility power is restored,” Amazon said.

On Sunday, “a set of breakers responsible for isolating the DRUPS from utility power failed to open quickly enough.”

These breakers are installed to “assure that the DRUPS reserve power is used to support the data center load during the transition to generator power. Instead, the DRUPS system’s energy reserve quickly drained into the degraded power grid.”

Power required by the data center to continue operating was therefore not delivered, operations failed and large amounts of data were made unavailable.

AWS has pledged to introduce more circuit breakers to allow generators to activate before UPS systems are depleted in the event that utility power supply fails in the future. It also plans to make changes to its software, expected to be made available in Sydney in July, to make its APIs more resilient.

Have your say

Please view our terms and conditions before submitting your comment.

required
required
required
required
  • Print
  • Share
  • Comment
  • Save

Webinars

More link