Update: For news on the November 2020 outage, go here.
Amazon Web Services suffered disruption to its operations in the US, with a “power event” affecting one of its cloud data centers in Northern Virginia, comprising the US-EAST-1 region.
A single Availability Zone saw connectivity issues, impacting services like RDS, Redshift, WorkSpaces, EC2 and EBS for approximately 30 minutes.
The company warned that the outage led to hardware failures, meaning that some customers that were keeping their workloads in a single Availability Zone might never be able to recover their instances.
The issue was made worse by the fact that around the same time, customers were experiencing minor problems with US-EAST-2 region located in Ohio.
Bad day
The problems with US-EAST-1 started at 3:13 PM PDT, with some customers unable to reach their compute instances.
“We can confirm that there has been an issue in one of the datacenters that makes up one of US-EAST-1 Availability Zones. This was a result of a power event impacting a small percentage of the physical servers in that datacenter as well as some of the networking devices,” stated the report on the AWS Service Health Dashboard.
“Customers with EC2 instances in this availability zone may see issues with connectivity to the affected instances. We are seeing recovery and continue to work toward full resolution.”
Just heads up we’re still experiencing slow response times and interruptions due to an extended AWS outage. We’ll keep you updated.
— Stitcher (@Stitcher) June 1, 2018
At 4:29 PM PDT, the company said it restored power to the majority of affected systems. An hour later, it added that the problems in its data center led to serious hardware failures.
“We have been working to recover the remaining instances and volumes. The small number of remaining instances and volumes are hosted on hardware which was adversely affected by the loss of power. While we will continue to work to recover all affected instances and volumes, for immediate recovery, we recommend replacing any remaining affected instances or volumes if possible.”
Around the same time, minor problems were reported with US-EAST-2, affecting services including EC2 and EFS.
“Between 12:11 AM and 12:45 AM PDT we experienced impaired Internet connectivity in the US-EAST-2 Region. The issue has been resolved and the service is operating normally,” the Dashboard reported.
This morning from approximately 12:11am - 12:45am PDT, #Amazon reported a significant Internet connectivity outage in their AWS-east-2 region. We detected complete loss of connectivity from this region to at least 10 other SaaS and API services. Stay tuned for further analysis. pic.twitter.com/mgRLSqTjC7
— ThousandEyes (@thousandeyes) May 31, 2018
Northern Virginia, where US-EAST-1 is located, is the largest data center market in the United States. It is home to hyperscale facilities by Google and Microsoft, and an upcoming $1 billion data center campus by Facebook. In terms of colocation, local operators include Iron Mountain, Sabey Data Centers, COPT, Infomart Data Centers, DBT Data, H5 Data Centers, RagingWire, and Equinix.