US-based researchers have described a type of cyber-attack that can bring down physical infrastructure without the knowledge of undisclosed vulnerabilities or the need for clever software tools.
A team from College of William and Mary in Virginia, together with colleagues from Ohio State University, has published a paper which explains how a malicious customer could wreak havoc in the data center by simply maximizing power consumption of the racks they have paid for.
Since cloud-scale data centers frequently overprovision their power, very high consumption could trip the circuit breakers and take the whole facility offline.
Researchers have called this method the ‘power attack’ and successfully carried it out against virtual models of real-world data centers, including one of Google’s facilities in North Carolina.
Several experts have confirmed to DatacenterDynamics that such an attack is indeed possible, but only in cases where there’s no policy-based power capping in place.
Of airplanes and toasters
More often than not, data centers are built in stages, gradually increasing their compute, cooling and power capacities in line with demand. It is also typical to oversubscribe some or all of these resources – similarly to how airlines often sell more tickets than there are seats on the plane, safe in the knowledge that one or two passengers will fail to show up.
In the event these extra passengers do show up, the airline has to offer concessions. In the case of a data center, matters are much more complicated.
In a nutshell, a ‘power attack’ would use legitimate software to intentionally push power systems beyond their limits. If the attacker manages to buy up cloud resources hosted in one location, and send all of their racks into overdrive at the same time, power demand will reach levels where it could trip the circuit breaker. Even if it doesn’t, heat damage will surely degrade the performance and lifespan of servers.
In their tests, researchers used several approaches to a power attack, from creating an endless virtual machine migration loop to replacing integers with floating point numbers in complex calculations, to running intensive benchmarking tools like the High Performance Linpack.
Here’s another analogy: imagine you are working in a shared office. You have plugged in your PC, a laptop and a printer – none of which matter in the wider context of a building. Meanwhile your office neighbor brings in a dozen toasters and switches them all on at the same time, killing the power and ruining everyone’s day.
According to the research paper, Google, HP and IBM are just some of the large public cloud vendors who have adopted power oversubscription practices, making them vulnerable to a power attack.
“Yes, in theory, many data centers, not just cloud but colo and enterprise data centers could be caused to overload a part of their power distribution system with aggressive workload deployment,” Liam Newcombe, CTO at Romonet, told DatacenterDynamics.
“There is nothing wrong with oversubscription, in any large enough environment you get a statistical levelling of load as not all customers or not all platforms will peak load at the same time, this is one of the basic advantages of a service provider. The only questions are: what level of over-subscription are you prepared to allow and how do you manage it?”
Newcombe says there are a number of simple IT deployment and management practices that can effectively neutralize this attack vector, for example using Intel Node Manager or similar software to assign rack, row and room-level power budgets.
“My sense is the large public cloud providers will handle an attack better simply because they are set up to balancing across multiple DCs and can quickly disconnect the requesting source,” added Ed Ansett, managing partner at i3 Solutions Group.
“The providers who are probably at most risk are in private and hybrid cloud where the infrastructure is tied to a small number of data centers.”
The paper suggests the risk of a power attack could be completely mitigated with the adoption of rack-level UPS – something Microsoft recently discussed at the OpenCompute Summit.