Even after three or four years of hype, there are still plenty of questions about Edge computing - the movement to put small micro data centers close to end users and applications, to provide low latency processing power.
But one of the biggest questions, could be a surprising one: “What happens when you go to an Edge facility… and open the door?”
That sounds trivial, but it turns out to be crucial to data center deployment. And the best answer so far has come from ASHRAE, the industry body which literally wrote the book on building reliable data centers.
A technical problem
ASHRAE’s name stands for the American Society of Heating, Refrigerating and Air-Conditioning Engineers, and 20 years ago its members were being asked to equip a new class of special buildings designed to house IT equipment.
ASHRAE’s Technical Committee 9.9 considered the airflow, temperature and humidity required by the equipment within those new data centers.
“Most buildings are there to serve people,” says Jon Fitch, a data scientist and TC 9.9 member. “But data centers and mission critical facilities are there primarily to serve equipment. It's a very, very, very different take on what a building is there to do.”
TC 9.9 produced a series of books and recommendations which have become the Bible for data center buildings, and have been incorporated into building regulations. They’re in use around the world, because, despite the “American” in its name, ASHRAE has been international since it was formed in 1894.
Fast forward to now, and there’s a hype machine in action, telling us that centralized data centers aren’t enough. We urgently need to build a whole lot more tiny data centers outside brick and mortar buildings, in smaller buildings, cabinets and containers which are close to mobile phones, people, autonomous vehicles and the sensors used by the Internet of Things.
The Edge hype says new applications need a fast response, that can only be delivered by these Edge facilities.
The hype also assumes that, because it’s needed, this can be delivered. and it will use the same IT equipment and at the same cost as in a brick and mortar data center.
But there’s a problem with that. Jon Fitch is lead author in the team that explains what that is, in ASHRAE’s new Technical Bulletin Edge Computing: Considerations for Reliable Operation. It’s a short document, that distills ten years of work, he tells us.
“Edge is driven by proximity to the data, not by factors like disaster avoidance. It’s all about getting computers closer to customers and data,” he explains. Traditional data centers can be sited using a risk aversion map, which says shows the risk of natural disasters, so you can choose “a lovely area that is very risk-averse.”
Edge data centers don’t get that option: “These Edge data centers can go in a dirty metropolitan area where there's all kinds of pollution and vehicle exhaust. They could go in an agricultural area, they could go into a dusty area where there were seasonal winds which blow up dust storms.”
Edge data centers are also typically small and modular, in shipping containers, phone-box sized modules, or even smaller units, and this has consequences: “Many items that are non-issues for brick and mortar data centers are real issues for small edge data centers.”
Telecoms networks are already deployed at the Edge of course, but they use hardened equipment. Specifications like NEBS, defined by AT&T in the 1970s, mean the kit is resistant to changes in temperature, humidity, airborne pollution, and dust. Telecoms engineers can work on equipment in all weathers.
“It's hardened to a 55°C temperature excursion capability, with a dust filter on the bezel,” says Fitch. “And this equipment comes with a higher price tag, it's a higher cost structure, a more expensive business model.”
Hardened equipment is too expensive for the mass deployment of IT to the Edge that is envisaged today. Effectively, we are asking engineers to deploy equipment into a hostile environment for which it was not designed.
“There are two schools of thought,” says Fitch. ”One is you can harden the hardware at a higher cost point. Or you can take care of the environment. I think the technical bulletin we've written provides a pretty good blueprint for how you can control the environment and use lower cost structure, commercial off the shelf [COTS] IT equipment for edge applications. That's the type of equipment that most providers are used to using.
"The challenge we face today is how do we achieve Telecom results with economical COTS equipment. Our bulletin does tell you the steps you need to take to engineer those Edge data centers, so they are compatible with commercial off the shelf IT equipment, and achieve similar uptimes.”
He explains: “Compare a small modular edge data center to a large brick and mortar cloud data center for a moment. The cloud data center probably has at least three doors between the outside environment and the IT equipment. The distance between those doors is probably 30 meters or more, and there's no way all those doors will be open at the same time.
“If you open a rack door in a data center, what happens? A whole lot of nothing. Because you're opening a rack door from an environment that's 20°C and 50 percent relative humidity, to an environment that's 20°C and 50 percent RH.”
It’s different in an Edge data center, where the outdoor environment enters the enclosure the moment the door is opened, bypassing HVAC and filtration: “You can’t always choose the place and timing of your service. If you've got a winter blizzard and your Edge data center goes down and it needs service, you've got to go out there. And when you open that door, the cold winter air rushes in immediately - or it might be desert air, dusty air, or moist air from the morning dew.”
Surviving these effects means more than just physically engineering the Edge facility. When ASHRAE developed its recommendations for buildings, it rapidly found that manufacturers define what you can do with their IT kit in warranties. These warranties now include an allowance for a certain amount of time outside the ideal conditions - so-called “excursions.” If something goes wrong, and you can’t show you kept the equipment within its tolerance - you’ve voided the warranty.
“Most IT equipment specs for temperature and humidity are written for 7x24 steady state operation in a brick and mortar data center where the environmental conditions are well controlled,” says Fitch. When you open the door to a small edge data center you can change the temperature and humidity.
“A lot of IT equipment has the capability to record temperature,” he points out. Hard drives have a SMART [Self-Monitoring, Analysis and Reporting Technology] data sector on them that records temperature periodically. Most servers have a temperature data capture on them. And so there is going to be a discussion between the customer and the IT equipment supplier. This data is being recorded and at some point somebody is going to notice and say ‘Hey, on October 16 you opened the door to that Edge data center and that cool air went on and your equipment dropped below its rated warranties.”
Technicians need to step carefully. Humidity and dew point is a non-obvious problem, warns Fitch: “A technician may service an Edge data center on a beautiful 74°F (23°C) morning in Atlanta. He’s thinking it’s a great day to be a data center technician, but Georgia has a lot of humidity and there’s dew on the grass. When the technician opens the door, humid air rushes into the facility, which is at 68F°. Within minutes his equipment is covered in condensation.“
The solution is to carry a handheld temperature and humidity monitor: “Environmental conditions like inrush are non-obvious. You may need to train service personnel on how to interpret readings to determine whether condensation is going to be a problem. A $100 to $200 monitor could save $1,000s."
Alternatively, technicians can use the temperature readouts from the IT equipment itself, but the main thing is awareness: “These would be non-concerns in a brick and mortar data center, but need to be part of the mindset of the Edge service technician.”
Use a tent
To open the door safely, Fitch says we’ll need to work under a shroud: “The best solution is something simple like a small tent that encloses the door and still provides enough room to work.”
Another solution might be an accordion between the Edge enclosure and an air-conditioned bay at the back of the service truck: “These are off the wall ideas but things we need to think about.”
The tent will help with other issues such as air pollution: “You own the land but you don’t own the air stream!” he warns, and pollution can be seasonal. Crops are only sprayed at certain times of year, coal-fired heating systems may be switched on or off, and prevailing winds can change.
Pollution and corrosion are cumulative failure risks. Material that enters on one occasion will remain there, and accumulate. As well as potentially causing short circuits, extraneous matter in Edge data centers can cause corrosion.
Real-time corrosion monitors are available, and - importantly for Edge facilities - they can be networked and checked remotely. “You want advance notice of corrosion problems, with enough lead time to install corrosion abatement filtration,” says Fitch.
“If you wait until you are seeing corrosion-related failures, all of your IT equipment has likely been compromised and you will either have to live with a high failure rate or do an expensive rip-and-replace with new equipment. Neither is a good option.”
Full-scale data centers have filters to keep out dust, but Edge facilities don’t: “Dust is usually removed by MERV 11 and 13 class filtration,” Fitch explains. “When you open the door to one of these modular data centers, you completely circumvent the filtration.”
This dust accumulates. “You might say ‘Why can’t I just take compressed air and blow it out?’ Well, here's the problem. A lot of dust is comprised of silicon dioxide, but some dust also has stuff like gypsum, salts, and other materials. If they get down inside a contact, like a DIMM or a processor, and you do a service on those, what you can do is actually smear the particles onto the contact. I liken it to smearing peanut butter or Nutella on toast. It's very thick and viscous and, by golly, if you want to get it off the toast, it's pretty hard to do!”
This is made worse, he says, by the sheer number of contacts. One 2U server can have 10,000 contacts (288 per DIMM, 3,600 per CPU, and 64 or 98 per PCIe). “Some of these contacts are redundant - like power and ground - so they're non-critical. But a lot of these are single point of failure contacts. So if you get a smear on that contact, you have a failure.”
These considerations means Edge data centers end up being lowering the overall efficiency of the data center fleet, says Fitch.
Data center builders are working to eliminate air conditioning from brick and mortar facilities, and minimize redundant equipment, but Edge data centers will have to be designed for reliability.
This means some redundant equipment: “If you have a remote data center that’s going to take several hours or more to reach, it needs to have some level of failover and redundancy. Think about a phone booth sized edge data center, maybe it’s got a 42U rack and it’s got 20U of compute in it - that’s ten 2U servers. If any one of those fails, you’ve got only 90 percent of your compute capability, and you’re going to need a failover spare. Failures or service interruptions in a small data center can have a bigger impact than in a large cloud data center, which might have tens of thousands of servers available.”
Most Edge facilities will have some sort of air conditioning: “Most regions of the world have some form of extreme temperature or humidity, and will need some sort of aircon or mechanical environmental control. If 20 percent of your data center fleet is now dispersed, and all those Edge data centers have direct expansion (DX) air conditioning, that can reduce the efficiency of your fleet. It’s difficult to implement airside or water side economizers, in Edge data centers.”
Another challenge is power distribution and backup in a small space: “UPS or batteries that you would locate in a separate gray area in a large brick and mortar data center, now may reside in the same enclosure as the computer equipment. That’s an additional engineering challenge. For example, batteries need to have very good ventilation to make sure there’s no buildup of gases like hydrogen. You have dissimilar equipment in the same environment. You need to control the environment to the narrowest specification range of whatever facilities equipment you have.”
Brick and mortar data centers can have a contained hot aisle, he says, “but how do you implement that in a phone booth sized facility?”
Can new tech help?
Given these constraints, some vendors have suggested that Edge could be a breakthrough use case for liquid cooling. They argue that liquid cooling systems don’t need raised floors or contained aisles, and can operate quietly in environments alongside people. Fitch cautions against bringing in new tech for remote installations.
“Liquid cooling is not ready. Piping and tubing is an opportunity for leaks, and an Edge data center may be hours or days from a service person,” he says. That would be long enough for a small loss of coolant to result in a significant high-temperature excursion.
While liquid cooling systems have been developing rapidly, their pipes and tubing don’t have enough hours in service to be used remotely yet: “For facilities that are fairly remote, approach new cooling technologies cautiously and conservatively. If it’s five min from a service technician, then maybe it’s a different story.”
Likewise, while Microsoft’s underwater Natick experiment has shown that it’s possible to run a data center for years without opening it up, but Fitch says lights-out operation isn’t ready yet: “That’s an aspirational goal, but I don’t think it’s very practical right now. DIMMS need resetting, and servers and software need to be upgraded. Sometimes there’s no substitute for being able to go out there and troubleshoot the equipment firsthand. So I would say a sealed never touch. a data center is a good aspirational goal.
All this can sound daunting, and it sounds as if there are inevitably losses of efficiency. But there are choices an operator will have to make: “Either you harden the hardware and have a higher cost point, or you take care of environment, and you can continue to use COTS equipment.
ASHRAE believes the bulletin will enable operators to do just that - and it’s made efforts to communicate the urgent information succinctly. “This technical bulletin is a new form of communication, which ASHRAE is going to use going forward. It’s there to communicate succinctly and rapidly actionable information that the industry needs. We’re taking the information from what used to be 30 to 50 page academic white papers, and we’re rolling it up into a crisp 10-page actionable document.”
It’s going to have to communicate it well because, unlike the original ASHRAE TC 9.9 work, data center builders will work with it directly, not via building codes and legally-binding regulations. “I don’t see this as something that’s going to be taken up and rolled into legislation, like has been done for buildings,” says Fitch.
“These are small facilities. And so I think keeping this information at the user level makes a lot more sense.”