Colocation giant Equinix has brought two-phase liquid cooling, where a fluid is allowed to boil and condense, into use on live production servers for its Metal infrastructure as a service offering.
The company has been testing ZutaCore's liquid cooling systems at its co-innovation center for a year, and in June 2022 installed some of the two-phase tech in live servers for its Metal infrastructure-as-a-service offering. A new company blog reports that a rack full of operational two-phase cooling has been stable for six months in its NY5 data center in Secaucus, New Jersey.
Liquid cooling is coming into use for demanding applications as it removes heat more efficiently from servers. Two-phase cooling improves that further because the fluid removes even more heat when it boils.
"For our first [two-phase] liquid-cooled production environment we went with a standard set of hardware: no overly power-dense CPUs, no GPUs," says Equinix field CTO My Truong. "For expediency, we chose to convert air-cooled single-socket AMD SP3 platforms already in the field (the bare metal instances our customers know as m3.large.x86). These standard 19 in 1RU servers are a high-volume system for us, a fleet that’s readily available for side-by-side testing."
These processors can produce up to 200W of heat, a thermal design power (TDP) that is compatible with existing air cooling systems, but Equinix can see higher TDP servers coming this year, and wanted to test two-phase cooling on a known quantity. The company also wanted to use it on production systems such as servers running the control plane and customer portal for Equinix Metal, a cloud service developed from Packet, which Equinix acquired in 2020.
Equinix engineers installed a 6-RU liquid-to-air heat exchanger at the bottom of a rack, with a manifold distributing liquid to "more than 20" 1-RU systems in that rack. Blue tubes carried cooling fluid to reservoirs attached directly to the hot chips, where the fluid evaporated and was carried away by red tubes to the manifold where it condensed, before returning to the heat exchanger - a process detailed exhaustively in a photo gallery in the blog.
"The short lateral runs from the server to the manifold are important," said Truong. "A shared manifold across racks is not a suggested installation method."
The ZutaCore system uses Novec fluid, which vaporizes at 33°C (92°F), so the reservoir keeps the chip at that temperature.
"We’ve trained data center teams for decades that liquids and electronics don’t go together," says Truong, but the two-phase system minimizes that issue. "The manifold and its quick connections use a non-spill design that releases at most a drop of fluid when disconnected. The fluid itself also has a convenient property of completely evaporating in atmospheric conditions."
The systems were pressure tested to spot any potential leaks.
Truong reports that the system was removing less heat than ZutaCore recommends, but easily kept the processor temperatures below 52°C at all times, "which is considerably lower than any air-cooled 1U equivalent under load."
The engineer predicts that data center architectures will change soon. "Data center liquid cooling will go from being almost exclusively in the HPC realm to becoming a standard requirement for systems. Colocation operators like Equinix will be changing buildings and operational policies to enable customers to leverage it in pursuit of sustainability."
As a cloud service run by a colocation provider, the Equinix Metal service gave Equinix a chance to try out something that its customers may need to adopt in the future, with the company saying: "As an operator of data centers with a large footprint of its own IT equipment, we can help guide the industry toward a better outcome for this third rock from the sun."
Truong calls out one issue with liquid cooling - it produces benefits that are not measured by the industry's leading efficiency metric, PUE. "The data center energy efficiency metric that was created largely with the assumption that air cooling and fans were a necessary part of servers, switches and routers, doesn’t really capture the benefits of liquid cooling," he says: "Direct-to-chip liquid cooling mostly removes the need for server fans, making PUE higher while lowering total power use."
Nvidia and Vertiv have proposed a new metric: Total Usage Efficiency, or TUE, and Truong approves. Truong said: "A more holistic and up-to-date metric should look more along the lines of TUE than PUE."
Issues with two-phase cooling have emerged recently. 3M has announced it will phase out Novec owing to increased regulations due to the health risks of PFAS chemicals including Novec. Those regulations could also affect proposed alternatives from other vendors.
DCD has asked Equinix if this development will affect its liquid cooling strategy