Cloud-based infrastructure services provider Amazon Web Services said it would give customers whose cloud infrastructure was connected to the element’s of Amazon’s infrastructure involved in the recent multiple-day service outage 10-day credit equal to 100% usage of some of the resources they contract for with the company, regardless of whether or not their applications were affected by the outage.
Amazon said in an incident-report released Friday it would apply credit to the accounts of customers with Elastic Block Storage volumes or running Relational Database Service instances in the affected availability zone.
Click here to read the full incident report
The company has also outlined a number of initial measures it plans to take to prevent a similar issue from happening in the future. One action-item is to change capacity-planning processes, as the company’s engineers discovered during the incident that a lot more free capacity was needed to deal with outages of this scale more effectively.
Because the issue was caused by an error during a network upgrade, Amazon said it would also audit its change processes.
Finally, the company is planning to focus on making it easier for customers to build their applications across multiple availability zones and using that redundancy to maintain application availability in the event of failure in one zone.
“In this event, some customers were seriously impacted, and yet others had resources that were impacted but saw nearly no impact on their applications,” the AWS team statement read.
The planned efforts to increase usage of multiple zones include creating better tools for multi-zone deployments and a free webinar series. The webinars start on 2 May and will focus on fault-tolerant application design for cloud-based infrastructure.
The outage affected many websites. Some came down completely and others had serious performance issues.
Sites affected included Reddit.com, Quora.com, GroupMe.com, Foursquare and many others.
The problems began around 12:45 am PDT on 21 April and by 3 pm on 24 April, the AWS team was able to restore all services except 0.07% of EBS volumes in the affected availability zone, which could not be restored in a consistent state.
As of the time of the release of the aforementioned incident report, functionality to all affected services had been restored, Amazon said.