Archived Content

The following content is from an older version of this website, and may not display correctly.

The power outage that brought down a portion of an Amazon data center in Dublin, taking down along with it the virtual infrastructure of a number of the company’s Cloud customers, is no longer believed to have been caused by a lightning bolt.

In a detailed online account of the outage, the Amazon Web Services team wrote that the utility whose transformer went down on the morning of 7 Aug. had taken back its initial diagnosis of the culprit, when it said a lightning strike had caused the outage, but had not offer an alternative explanation.

“The utility provider now believes it was not a lightning strike, and is continuing to investigate root cause,” the AWS team wrote.

Regardless of the cause of the utility outage, the company’s Dublin facility failed to switch to back-up generators as it was supposed to when utility power is lost. The AWS team said it believed the facility’s programmable logic controllers were at fault for this.

PLCs synchronize the electrical phase between generators before feeding power into the facility. In last week’s case, however, a PLC at the data center detected a ground fault, which caused it to fail to complete its task, the AWS team believes at the moment.

Because of the PLC’s failure, backup generators for most of the data center were disabled and there was not enough power to continue running all the servers.

The outage affected Amazon’s Infrastructure-as-a-Service businesses Elastic Compute Cloud (cloud servers) and Elastic Block Store (cloud storage) and its cloud database service called Relational Database Service. Cloud instances of these three services hosted in Dublin felt most of the effect.

Amazon said nearly all EC2 instances zone and about 60% of EBS volumes in the zone went down. Networking gear connecting the zone to the Internet and to other availability zones in the region went down as well, causing connectivity issues that resulted in customers receiving API errors.

A 110kV 10MW utility transformer serving the data center failed around 10:40 a.m. About one hour later, Amazon technicians were able to bring some of the back-up generators online by phase-syncing them manually.

This restored power to many EC2 instances and EBS volumes, but most of the networking gear was still down, so the restored instances were inaccessible. The technicians finally restored connectivity to the zone around 1:50 p.m.

Click here to read Amazon’s detailed account of impact of the outage on EBS and RDS and the recovery process for these services

The AWS team wrote that it would add “redundancy and more isolation” for the data center’s PLCs to insulate them from other failures. The company is working with its vendors to deploy a “cold, environmentally isolated back-up PLC”, while also correcting isolation of the primary PLC.

To compensate its customers, Amazon said it would provide a 10-day credit to all customers with an EBS volume or an RDS database in the affected availability zone, equal to 1—% of their usage of AWS resources.