Three years ago, we heard a lot about “lights-out” facilities. Data centers were going to become fully autonomous and run without intervention from people.
There were a whole lot of reasons this was going to happen. Staff are expensive, and equipment is reliable. One leader in promising this was EdgeConneX, a colo provider specializing in smaller facilities outside the major data center hubs. “When you try to operate this small footprint, like a two-megawatt facility, it’s difficult to man on-site. It’s just not economically feasible,” said EdgeConnex vice president Jeff King at a data center event.
At this point, we had had a decade of huge claims for management and monitoring systems like DCIM, which could handle the M&E equipment, while every IT systems vendor was promising fully virtualized private cloud systems made up of pools of storage and servers which could be built and changed in software, on demand - HPE’s “composable” data center for instance.
Surely, with all this, it should be possible to operate a data center remotely. Failed hardware could be sidelined, and replaced en masse during an occasional site visit by an engineer.
Going lights-out promised other benefits besides staff savings. Automatic processes could be more reliable we were told - Experts in failure, like the Ponemon Institute, regularly list human error as the top cause of outages.
Meanwhile, conditions in an unstaffed data center could be optimized for the efficient running of the IT equipment, not the comfort of the staff. Modern equipment operates at a greater range of temperatures. The so-called cold aisle can now be as warm as 80°F, and the hot aisle can be more than 100°F. Stop cooling data centers for comfort, and a lot less energy is wasted.
And in these temperatures, if you get the physical IT work done by robots, then you can eliminate unused space too. Robots can be designed to operate in narrower spaces, leaving more room for the IT equipment.
Back then, companies like web hoster PayPerHost were promising robotic data centers, while LitBit was going to use AI to automate the maintenance procedures. And EdgeConneX had edgeOS, a platform for remote management.
A peak moment for the lights-out data center movement came in 2016, when Microsoft announced that it had operated a rack of Azure Cloud servers on the sea bed for three months. Project Natick was sealed in a pressurized vessel, and operated in a lights-out manner, because the only way to physically access it was to haul it up to the surface.
Natick added another item to the list of benefits delivered by post-human facilities: it was filled with a nitrogen atmosphere: It was unbreathable, but it guaranteed there would be no fires.
After all that excitement, the world of lights-out facilities has gone rather quiet. Which is a shame, because we now finally have a use case which absolutely needs autonomous IT resources: The Edge.
The over-familiar pitch for the Edge is that, because new applications like the Internet of Things require low latency, data center resources must be available close to the sources of data, at the “edge” of the network.
Managing a multitude of tiny Edge resources manually would add so much overhead, those applications would become financially impossible, so now is the time for the lights-out facility to finally arrive.
The trouble is, things have gone quiet. DCIM products continue to potter along, making the same promises and not perceptibly changing. The robot arms are folded away: PayPerHost is sticking to its knitting: As far as we can tell, it’s never said anything more about robots.
LitBit seems to have disappeared without a trace: its site has gone, and its employees have all-but disappeared on LinkedIn, while founder Scott Noteboom has reappeared as CTO at liquid cooling vendor Submer, where he prefers not to mention his previous company.
EdgeConneX’s King says that small users don’t mind the idea of autonomous IT, but large customers don’t trust it: “Fortune 500 companies say 'okay that’s great, but we’re going to add staff.'”
And, perhaps ironically, those considering lights-out operation can be dissuaded by industry efforts to increase reliability. If a business wants Uptime Institute’s Tier III or IV levels of reliability, Uptime recommends a minimum of one to two qualified operators on site 24 hours per day, 7 days per week, 365 days per year. And it goes without saying, those staff need oxygen and room to move.
Project Natick is still going, and has expanded from one rack to twelve racks, in a sunken box the size of a shipping container, under the sea near Orkney. But that was a year ago. Microsoft promised to reveal results about now, but all it’s saying is that the test continues, with Natick in use by development teams.
“The test is ongoing,” Ben Cutler of Microsoft Research said to DCD. “We’re extremely pleased with the reliability and other operational metrics we've seen.”
That’s fine, but anyone thinking about lights-out operation will need a little more than that.