The idea of a 'lights-out' data center has been in circulation for some years, but why would you want one? And will they ever exist?
The underlying idea is simple. A lights out data center is a fully-automated facility which can operate without any staff. The lights can be turned out and the operator saves energy and administration costs.
The idea comes from manufacturing industries, and dates back to a 1955 short story by Philip K Dick. In the Netherlands, a Philips factory makes razors, supervised by a handful of quality control staff, while at the Fanuc factory in Japan, the air-conditioning and heat are turned off for 30 days at a time, to allow robots to make robots undisturbed.
When data centers were first conceived, computer systems needed regular maintenance and care. On-site staff frequently had to step into the chilled white space, to manually reset servers and rewire switches.
But that is changing. IT kit has been getting more reliable, and software defined networking (SDN) means racks can be cabled up once, with connections made and remade by software. Virtualization makes workloads independent of physical servers, and automation means resets and adjustments can be done remotely.
For ten years or more, admins have been managing servers - hundreds or even thousands of them - from their desks. No one needs to visit the aisles of a data center until hardware changes are needed. The mechanical and electrical parts of the data center are also automated, so chillers run unattended and can prompt engineers or call the manufacturer for occasional preventive maintenance.
It has been regularly observed that data centers “waste” space and energy, by maintaining conditions suitable for humans to work in, and setting up to meet all their other needs, for safety, bathrooms, and secure entry and exit.
The idea of “lights out” got its first big outing in 2011 when AOL, an Internet provider long past its prime, made a big show announcing a move to a radical model, using small unattended micro-facilities it called the ATC. AOL’s VP of technology Mike Manos, a data center expert who had previously worked at Microsoft, praised lights-out systems in a blog, crediting them with “fundamentally changing business as usual.”
Ten years on, AOL is long gone, and data centers still have an insatiable demand for staff.
It’s true that some ATC ideas have lived on. Data centers often practice “rack and roll” shipped racks to the data center with servers pre-installed.
Designers have pointed out that racks and servers could be placed closer together, and cooling bills could be slashed by running them at a hotter temperature, if not for the need to make the building habitable. Removing oxygen from the air could prevent fires completely, and reduce corrosion.
But by and large, data centers are still mostly large facilities, with staff on-site all the time.
The Uptime Institute, the go-to expert for data center reliability, has always recommended that data centers have staff on-site ready to deal with any problems. “For business objectives that are critical enough to require Tier III or IV facilities, Uptime Institute recommends a minimum of one to two qualified operators on site 24 hours per day, 7 days per week, 365 days per year (24 x 7),” said Richard F. Van Loo in a 2015 Uptime briefing, Proper Data Center Staffing is Key to Reliable Operations.
There’s been some change since that came out, particularly with providers serving lesser cities, such as EdgeConneX.
“Our whole business premise was based on lights out data centers,” EdgeConneX CIO Lance Devin told DCD. “We have 2MW sites, not 100MW behemoths. I can’t afford to put three engineers and 17 security people and two maintenance people in a site like that.”
EdgeConneX has wholesale customers, and runs a segmented management system, which gives customers control of the IT hardware, while EdgeConneX manages the power and cooling infrastructure.
It’s not entirely lights-out, but EdgeConneX has remote control security, so customer staff can be buzzed in through a mantrap, without meeting any of the operator’s personnel.
Lockdown and lights-out
Operators with larger facilities haven’t felt the need to do anything like that. But they all have the ability to manage some things remotely - and those powers got tested in 2020, because of the Covid-19 pandemic.
When people were told to stay at home, data center operators saw a big surge in use of remote control services. According to Brent Bensten, CTO at QTS Data Centers, logins to the company’s remote management portal, (the service delivery platform or SDP) jumped by 30 percent in the first three weeks of restrictions, with users spending twice as much time on the system.
Visits were still allowed, but people stayed away, and found that the sites could still operate with much less intervention, and a lot of people discovered the value of remote management. As Bensten put it: “Covid-19 is a perfect case to use the tools, so they can do remotely what used to be done on-site.”
Lights-out or skills-out?
In many cases, “lights-out” is a thinly-disguised way to de-skill data centers, either as a cost-cutting measure, or as a way to deal with the real difficulty in finding skilled staff.
Schneider Electric’s Steven Carlini promises to explain “Why every data center in future will be lights-out" in a blog post, which in fact argues that companies should make in-house data centers “as ‘lights out’ as possible - partly in response to the pandemic, .and partly to deal with the shortage of skills.
“Lights out and unmanned may not be entirely accurate,” says Carlini, “as security staff will most likely be on-site." He suggests data centers should hire security guards with basic mechanical skills, and have them do plug-and-play hardware replacements: “Companies are already experimenting with Zoom-guided maintenance and repairs.”
In a lot of cases, the idea of the “lights-out” data center has morphed into one where skills aren’t needed.
So have truly lights-out data centers ever really existed? There may be facilities operating in this manner, but generally they haven’t spoken to DCD about it. That may be for reasons of secrecy or because, like AOL’s ATC, they failed.
But we do know of one major exception.
Microsoft operated a small (240kW) data center very publicly for two years, with no site visits at all - because that facility was located on the sea bed.
In 2018, a Microsoft research team called Project Natick filled twelve data center racks with servers, loaded them in a pressure vessel, and sank it in the ocean off the coast of Scotland. For two years, the servers were untouched, and the project’s only communication with them was via power and network cables.
When Microsoft retrieved SSDC-002 (subsea data center 2) in 2020, the project had run workloads from the Azure cloud on Natick’s 864 servers and 27.6 petabytes of storage, unattended in a sealed cylinder filled with unreactive nitrogen gas.
“We operated this thing for 25 months and eight days, with nobody touching it,” Natick leader David Cutler told DCD. And the results were favorable.
Reliability and Moore’s Law
The underwater servers seem to have been about seven times more reliable than equivalent ones on land. Natick used a batch of second-hand machines, placing 135 in a land-based data center, and the rest in the sub-sea container.
“From the 135 land servers, we lost eight,” says Cutler. “In the water, we lost six out of 855.” The servers all ran the same tasks and none had any maintenance, but it seems that the vibration and oxygen atmosphere of a standard data center took a toll.
One big objection to lights-out operation is the fact that servers and storage need to be replaced periodically, not because they wear out, but because they are obsolete. For decades, IT hardware followed Moore’s Law. With performance per Watt doubling every 18 months or so, new servers would pay for themselves ever three years, simply by energy costs.
Now silicon processes are hitting limits, and Moore’s law is coming to its end, and servers will have a longer lifetime: “There is still a very strong case for savings in energy,” says Rabih Bashroush, research director at the Uptime Institute, “when replacing servers that are up to nine years old.”
Cutler predicts this will make operators move towards lights-out: “A huge percentage of the cost of a data center over its lifetime is the servers. In a post-Moore’s Law world, there’s really no reason to change the infrastructure every two years.”
Lights out on the Edge
While conventional data centers remain resolutely staffed, a new development may actually require lights-out operation: the much-hyped area of Edge computing.
New developments like the Internet of Things, and people streaming media and applications to their homes, is leading to a requirement for low-latency resources that are very distributed.
This means a large number of small facilities, placed close to the people and data sources. Most will be much smaller than Natick’s SSDC-002, and some will be weatherproof boxes on lamp-posts.
Servicing Edge capacity will be an economic nightmare, unless site visits can be all-but eliminated, much as the telephone network has done for fiber cabinets.
“They will tend to be lights out, like what we did,” says Cutler. “When you think about the Edge you’re gonna end up with things that operate on their own. People don’t go there for a long time because it’s too hard to get there.”
That takes us right back to the birth of lights-out. When Mike Manos launched the idea at AOL, he was actually talking about Edge facilities, designed to get AOL’s user-driven content out close to customers. In a somewhat ironic dig against the centralized approach of an upstart called Facebook, Manos, said AOL was moving to become a big content player: “You do need coverage where the readers are."
Lights-out will require serious technology, but it won’t be glamorous. A set of servers in a box on a wall simply cannot demand love and attention. Lights-out will come in because we will have kit that we simply need to neglect.