Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.

sections

Amazon traces cloud outage to faulty breaker

  • Print
  • Share
  • Comment
  • Save

Amazon Web Services has released details about the root cause of the outage of one of its public-cloud’s availability zones that started in the evening on 14 June and lasted until next morning, US Pacific time.

In a note posted on the cloud’s status dashboard, the company said the outage was caused by a cable fault in the power distribution system of the electric utility that served the data center hosting the US-East-1 region of the cloud in northern Virginia.

The entire facility was switched over to back-up generator power, but one of the generators overheated and powered off because of a defective cooling fan. The virtual-machine instances and virtual-storage volumes that were powered by this generator were transferred to a secondary back-up power system, provided by a separate power-distribution circuit that has its own backup generator capacity.

But, one of the breakers on this backup circuit was configured incorrectly and opened as soon as the load was transferred to the circuit. The breaker was set up to open at too low a power threshold.

“After this circuit breaker opened … the affected instances and volumes were left without primary, back-up, or secondary back-up power,” Amazon’s note read.

Customers in this availability zone that were running multi-availability-zone configurations “avoided meaningful disruption to their applications; however, those affected who were only running in this Availability Zone, had to wait until the power was restored to be fully functional.”

Among the customers affected with downtime were a number of popular web services, including Pinterest, Heroku, Quora, Foursquare and others, according to news reports. Heroku, for example, reported widespread outages of its production and development infrastructures that lasted for eight hours.

The faulty circuit breaker opened around 9pm, Amazon says, and the failed generator was restarted around 10:20pm. Most affected VM instances recovered by 10:50pm, and most cloud-storage volumes were “returned to customers” by about 1am.

Amazon said it had completed an audit of all of its back-up power distribution circuits and found another breaker that “needed corrective action.”

“We've now validated that all breakers worldwide are properly configured, and are incorporating these configuration checks into our regular testing and audit processes,” Amazon said.

Related images

  • Werner Vogels, CTO, Amazon.

Have your say

Please view our terms and conditions before submitting your comment.

required
required
required
required
required
  • Print
  • Share
  • Comment
  • Save

Webinars

  • Next Generation Data Centers – Are you ready for scale?

    Wed, 24 Aug 2016 16:00:00

    This presentation will provide a general overview of the data center trends and the ecosystem that comprises of “hyperscale DC”, “MTDC”, and “enterprise DC”.

  • White Space 47: There's a Pokéstop outside our office

    Fri, 22 Jul 2016 10:35:00

    This week on White Space, we talk everyhting: > Pokémon > Microsoft's Azure Stack launch > DatacenterDynamics Awards 2016 program > Digital Realty's move into Wind Power

  • White Space 46: We'll always have Paris

    Fri, 15 Jul 2016 10:35:00

    This week on White Space, we look at the safest data center locations in the world, as rated by real estate management firm Cushman & Wakefield. It will come as no surprise that Iceland comes out on top, while the US and the UK have barely made the top 10. French data center specialist Data4 is promoting Paris as a global technology hub, where it is planning to invest at least €100 million. Another French data center owned by Webaxys is repurposing old Nissan Leaf car batteries in partnership with Eaton. Brexit update: We’ve also heard industry body TechUK outline an optimistic vision of Britain outside the EU – as long as the country remains within the single market and subscribes to the principles of the General Data Protection Regulation.

  • Powering Big Data with Big Solar

    Tue, 12 Jul 2016 18:00:00

    The data center industry is experiencing explosive growth. The expansion of online users and increased transactions will result in the online population to reach 50% of the world’s projected population, moving from 2.3 billion in 2012 to an expected 3.6 billion people by 2017. This growth is requiring data centers to address the carbon impact of their business and to integrate more renewable resources into their projects. Join First Solar to learn: -Why major C&I companies are looking to utility-scale solar as a viable addition to their energy sourcing portfolios. -How cost-effective utility-scale solar options can support datacenters in securing renewable supply. -Case study of how a major data center player implemented solar into their portfolio

  • DC Professional - Meet John Laban

    Tue, 12 Jul 2016 15:25:00

    John has worked in the Telecommunications and Information Transport Systems (ITS) industry for over 35 years, beginning his career at the London Stock Exchange as a BT telecommunication technician. Believing there was a general lack of quality in the ITS industry, John was driven to "professionalize" the ITS industry – starting with a professional diploma programme for the Telecommunications Managers Association – which led to him becoming the first BICSI RCDD in the UK and soon after, a BICSI Master Instructor teaching RCDD and Technician programmes. Find out more about John and upcoming sessions here https://www.dc-professional.com/people/284/

More link