Archived Content

The following content is from an older version of this website, and may not display correctly.

Google has attributed issues with its cloud-based App Engine Datastore services on 18th August to a data center power outage that was caused by a thunderstorm.

On that day, the data center in the American Midwest serving these applications "lost utility power as a result of an intense thunderstorm," Google's Ikai Lan wrote in an email to customers on behalf of the App Engine team.

"Power distribution equipment in the data center failed in the wake of the loss of utility power, which powered off a subset of the machines in the data center," he wrote.

The loss of power reduced available compute capacity in the facility and took down parts of the storage infrastructure, which caused high latency, serve errors and, in some cases, total donwtime for App Engine master-slave Datastore applications.

Lan did not specify why the data center's electrical systems failed to switch to generators when it lost utility power or how long the facility remained without power. Google representatives did not respond to a request for comment in time for publication.

When it learned from the company's data center operations team that power to the facility would not be restored for "several hours", the App Engine team decided to perform an emergency failover for the affected applications to a backup data center.

Since Google's emergency-maintenance procedures do not allow for full replication between a primary data center and a backup data center for master-slave Datastore applications, data written to the primary data center in the period immediately preceding the outage does not get transferred to the backup site. This causes the applications to appear to "jump backwards in time" when they come back up, Lan explained.

In cases of data center outages, the App Engine team usually has a choice between waiting until the primary data center is brought back and performing the emergency failover. Lan wrote that the team does not take this decision lightly for the reasons described above, but since the 18 August outage was expected to last for so long, the team decided to perform the failover.

"During this outage, the impact of the adverse weather conditions continued for much longer than the App Engine team had anticipated, and made it impossible for the data center operations team to safely begin the repair process until the storm ended," he wrote.

"As there was no estimated time for the data center to return to service at that point, the App Engine team elected to perform an emergency maintenance to switch master-slave Datastore applications to their backup data center."