For some years, the security industry has warned that infrastructure equipment is vulnerable to cyber attack, so potentially, power grids and other basic utilities could be sabotaged remotely. At the end of 2015, it seems this may have come true: Ukraine suffered a serious power outage, which is believed to have been caused by a malicious hacker.
If this is really happening now, what are the implications of hacker-driven power outages to data center operators?
What’s changed?
Interconnectedness is what has changed. The study Securing the US Electrical Grid by the Center for the Study of the Presidency and Congress (CSPC) published in 2014 raised several concerns, all resulting from modernizing power grid Supervisory Control and Data Acquisition (SCADA) systems:
“Paradoxically, as the grid is increasingly networked—thus increasing efficiency and overall situational awareness—it becomes increasingly vulnerable to intrusions from cyberspace.”
Seen it coming
The CSPC report’s concern about increased vulnerability was justified when facts started to surface regarding the successful cyber attack against the Ukrainian energy provider Prikarpatjeoblenergo.
One of the teams trying to uncover the “who, what, where, and why” of the cyber attack is SANS Institute’s Industrial Control Systems (ICS) group. The ICS team published its initial findings in Potential Sample of Malware from the Ukrainian Cyber Attack Uncovered and Confirmation of a Coordinated Attack on the Ukrainian Power Grid.
Regarding the attack, the second paper’s author and SANS ICS Director Michael J. Assante says: “The attackers demonstrated planning, coordination, and the ability to use malware and possible direct remote access to blind system dispatchers, cause undesirable state changes to the distribution electricity infrastructure, and attempt to delay the restoration by wiping SCADA servers after they caused the outage.”
Fortunately for the Ukrainian citizens caught in the middle, it was possible for engineers at the targeted Ukrainian power utility to shut down the malware-infected SCADA system and use manual mode. That allowed the engineers to restore power to the entire grid in less than six hours. Assante warned that utilities more reliant on automation might not be able to restore large portions of their system in a similar fashion.
What about data centers?
For data centers a grid outage is a grid outage, whatever the root cause. An outage caused by a malicious attack may go on longer, or be timed to be more damaging, but the protection is the same - have a measure of of power redundancy.
There are multiple ways to handle power redundancy. Most designs, in particular smaller commercial facilities, use switching controllers that choose between power from the grid or power from backup generator/s. The controller then feeds the power to the UPS system, which in turn supplies electricity to the critical computing infrastructure.
A power attack combined with a grid attack could be a train wreck waiting to happen
If grid power is lost, the switching controller cannot deliver power to the UPS. However, the UPS will still supply electricity to the critical equipment, but only for a limited time. At this point, the switch controller has two options:
- Wait and see (for a specified time) if the power outage is momentary
- Start the backup generator/s so the UPS system’s batteries are not completely drained
When the power comes back, the process is reversed.
Keep stocks of fuel
Most data center operators agree a six-hour power outage, similar to the one that occurred in the Ukraine, is almost a non-event, adding they have enough fuel on hand to cover that length of time even if the backup generator/s are under full load. Additionally, data center operators have fuel supplier contracts in place stating explicitly the data center will be provided fuel at the rate needed to run indefinitely.
James Hamilton, someone who knows his way around a data center, reaffirms the importance of fuel contracts. Data center operators should insist on receiving the highest priority possible. Hamilton also underscores how important it is to have written assurances from fuel suppliers that they have adequate quantities, sources for more fuel, and means to pump and deliver fuel during a power outage.
Other challenges
In an ever-more-competitive market, data center operators are employing a technique called power oversubscription, the allocation of more power to computing equipment than the total amount of power available to the facility. That allows more servers to be hosted without the need to upgrade the power infrastructure. Oversubscription, for the most part, works as not all servers reach maximum power consumption at the same time.
However, it is entirely possible that the power consumption of the computing equipment will, at some point, exceed available capacity. And since the intent is not to upgrade the power infrastructure, undersized UPS and backup generator systems are a potential financial train wreck waiting to happen if excessive power consumption occurs during a grid power outage (accidental or hacker induced - through a so-called “power attack”).
Maintenance and errors
Hamilton brings up another point. For any one of a myriad of reasons redundant systems are seldom exercised in production mode. That introduces the possibility of those systems being incapable of handling the load when they are most needed.
The obvious answer is regular maintenance, says Paul Kirvan, an independent IT consultant and writer in a column: “This (maintenance) means scheduling tests of primary and backup power systems, regular inspections, and following manufacturer recommendations for maintenance and support. Another key aspect of maintenance is the need for benchmarking. During maintenance, various tests are performed. The results of such tests are the most meaningful when they are tracked over time, rather than simply counted as pass/fail.”
Operational errors are another concern of Hamilton’s. Data center workers, when forced to respond to extraordinary circumstances, are under increased pressure and encountering unfamiliar conditions; both of which increase the likelihood of errors. Something not always considered, but a valid concern.
Bottom line
There has been lots of press lately regarding how data centers are increasingly becoming targets of choice as more and more businesses move their digital operations to the cloud. Data center operators understand this and are securing their facilities against all known attack vectors.
However, the bad guys do not follow the rules. They could adopt a so-called ‘power attack’ on a data center, hitting the operator’s overprovisioning, by running process that use maximum power on legitimately rented servers, after first digitally knocking the substation supplying power to the data center offline? It is unexpected, but feasible.