All the while the industry has been trying to make itself more energy efficent, there’s been an elephant in the room has been rampaging through the data center. But now it might have finally been tamed. By a bunch of elves.
There have been massive strides in data center efficiency. Builders and operators looked at all the wasted energy in their cooling and power distribution systems, and came up with a measure called PUE (power usage effectiveness). PUE figures of 2 or 3 or more were common - and by applying measures like outside-air cooling, it was possible to cut this back close to the ideal figure of 1.0. This literally cut in half the amount of energy used in a comparable data center.
But what happens to the energy when it gets to the racks of servers? How efficient are they? Industry efforts have generally ignored this problem. It’s been too hard to solve. We’ve had a general feeling that virtualization allows, and that’s all good. But seriously, a data center could have a PUE approaching 1.0 as close as you like, but if the servers are idling then all the power they use is wasted. The elephant in the data center is the software and what it does.
Now, I still haven’t seen a really good answer to measuring the efficiency of software (that can get subjective) but this week I heard of a good step towards improving it. And here’s where the elves come in.
Professor Stephen Blackburn and Xi Yang of the Australian National University teamed up with Kathryn McKinley of Microsoft Research to look at the way tasks are scheduled on servers, in order to make them use their resources better.
We all know, I hope, that modern microprocessors have the ability to multitask. Several threads can operate simultaneously. But it seems that simultaneous multithreading is actually turned off in vast numbers of cases - from stock trading to Google - because there’s a chance that a slow batch thread can interfere with the operation of a higher priority thread which has a service level obligation.
Simultaneous multithreading is turned off in vast numbers of cases - for fear that batch threads might break the service obligations of higher priority threads
Blackburn and his colleagues applied something they call “principled borrowing” of resources, inspired by the elves in the Grimm fairy tale “Die Wichtelm¨anner,” who use a cobbler’s tools to make beautiful shoes while he is asleep.
They crafted an addition to the Linux kernel, called Elfen, which sets up multiple “lanes”. The non-time-critical batch processes operate in a batch lane, and borrow resources only when the system detects that latency-critical processes don’t need them.
The system needs low-overhead monitoring, and a new system call, nanonap, to control the resources directly. The researchers tested their ideas on a standard Intel server, operating the Apache Lucene enterprise search algorithm. They found that it is possible to run batch processes in the idle cycles without breaking service level objectives.
All this can increase the work done by a processor, by between 25 percent and 99 percent.
Now, will this be widely implemented? That would depend on the data center operators and server makers. But it could tame that elephant.