Country missing? Please select your nearest region...
Published on 28th March 2013 by Yevgeniy Sverdlik
In case you're tired of reading about the virtues of data center containers (or containerized data centers), here's another article on the subject, although this one does have a twist. It's about using containers to contain (no pun intended) the spread of an outage.
There is a way to design management software and the supporting mechanical and electrical infrastructure in a container-filled data center so an outage does not spread beyond the box it started in. How? Ask Microsoft. That's what they've done in their Chicago data center.
In the latest installment of the Microsoft Global Foundation Services team's blog series on how the company runs its data center infrastructure, David Gauthier, director of data center architecture and design, writes about the 700,000 sq ft facility, launched in 2009.
Normally, a discussion about data center containers has an emphasis on deploying data center capacity quickly. That's their main appeal, and that's what container vendors have been pushing. Yes, containers allow Microsoft to deploy “unprecedented quantities of servers in a short period of time.” (Containers serve the same purpose for eBay). But that's only half of the reason Microsoft went with containers for the Chicago site. The other half is compartmentalization of failure.
“In the electrical and mechanical design of this data center, we considered each container as a discrete failure domain and modeled the availability of power and cooling with the expectation that maintenance events and unplanned outages would occur in the environment,” Gauthier writes. Failures would also be compartmentalized in a standard and predictable way.
The design is such that in case of a failure (or maintenance, for that matter), the entire workload of the container in question is picked up by another container. This is active-active within a single site.
What else? There are no redundant power feeds and no diesel generators. In case of a total utility outage (a rare occurrence at the Chicago site, according to Gauthier), the same software that moves workloads between containers moves workloads to a different data center. “In fact, one of these rare events did occur in 2011 as a result of an off-site lightning strike - workloads transferred to another facility as planned and aside from a slight increase in network latency, the applications never missed beat,” he writes.