A liquid-cooled high preformance computing (HPC) system is being tested at the National Renewable Energy Labs (NREL) in Boulder, Colorado, to see if it can deliver on promises to cut power requirements by up to 50 percent.

The Aquarius HPC machine from Aquila has been named “Yacumama” by its owner, Sandia National Laboratories in Albuquerque, New Mexico, which has deployed it at NREL for testing. The system uses warm water for cooling, transferring heat form the servers through cold plates built into a standard Open Compute Project (OCP) based design which can support up to 75kW per rack.

NREL will study whether the system measures up to claims made by Albuquerque-based Aquila: that the system will pay for itself in a year, with no trade-offs on performance. 

How cool is that?

– Bob Bolz

“This design suggests the owner-operator might well achieve 50 percent savings on the overall server power envelope from current air-cooled Bitcoin mining data centers, as an example,” said Bob Bolz, buisiness development director at Aquila and chief designer of Aquarius: “Even when allowing for the additional rack costs, we think those costs will be fully recovered in the first year from power savings. We think the NREL tests will bear this out.”

The system uses OCP-inspired 12V DC power distribution, and can be deployed quickly, Bolz said: “With the DoE’s Sandia Labs unit being tested at NREL, we think we can deliver to a buyer a purpose-built supercomputer as a fully-integrated micro data center in about a 60-day turnaround.”

Applications such as 5G, IoT, edge computing and cryptocurrency mining will need this kind of speed, along with energy efficiency, he added.

After testing at NREL, the system will move to Sandia later this year (likely August, say the developers). It will be housed in a new data hall currently under construction on the Sandia campus, in the mountains outside of Albuquerque.

The Aquila system uses cold-plate cooling technology, licensed from Clustered Systems of Santa Clara. The design is an evolution of the ExaBlade HP design tested at the Stanford Linear Accelerator Center (SLAC) in Menlo Park, California, in 2013 -  which has operated with zero failures for over two million hours across 128 servers, Bolz said.

The new design promises to come very close to the results shown by available liquid immersion technology, he told us, while taking advantage of the efficiencies of OCP-inspired 12-Volt DC power distribution. 

The system also reduces footprint, and improves reliability and resiliency, in addition to operating in near silence. These benefits, derived from liquid cooling, will be especially useful for future lights-out edge compute sites, Bolz said: “Our benchmarking suggests we’ll deliver the HPC industry’s best ROI and TCO.”

Sandia’s mission

“Sandia maintains a constant striving to reduce energy use in HPC and to make our data centers as energy efficient as possible,” said David Martinez, the Labs’ computing infrastructure engineering services lead. “Sandia addresses the problem from a systems process viewpoint. Liquid cooling systems extract heat directly from server board components and can prevent server speed-throttling due to overheating.

“Our New Mexico climate favors use of non-mechanical cooling, which, when combined with warm water inlet temperatures, saves considerable energy. We can now capture energy from the elevated return water to support indirect energy needs, such as process water, domestic hot water, and absorption cooling processes.

“The Aquarius system lowers the cost of cooling (for the) Yacumama HPC cluster we’ve deployed for testing at NREL’s water-cooled HPC center.”

Aquila’s Aquarius racks are designed to deliver both long-term reliability and re-use, enabling customers to launch their next-gen servers with only minimal re-engineering, Bolz said: “The racks feature a forward-looking OCPv2 form factor, and are capable of attaching directly to existing a facility’s water infrastructure in conjunction with a rack-mount cooling distribution unit (CDU) supplied by Motivair, Buffalo, NY.”

“We see Aquarius as a departure from other non-immersion liquid-cooling designs that employ diamond-like carbon heat-sink technologies,” he explained. “The Aquarius manifold and fixed-cold-plate architecture eliminate the possibility of leakage during servicing, as there are no plastic tubes or quick disconnects anywhere in the system. The design uses only electrochemically compatible metals, eliminating the potential for water contamination due to corrosion.”

Phil Hughes, founder of Clustered Systems, added: “Data centers cooled with air have fundamental restrictions on power density. Sandia Labs and NREL get it: warm-water cooling has none of these restrictions and can be packed very densely, without a need for a specialized forced-air driven building.”

Other significant benefits cited by the developers include a low equipment failure rate due to the elimination of fan vibration and enhanced thermal stability, which reduces sudden expansion/contraction-caused component stress. The system provides simultaneous cooling for all the major heat sources, not just the CPU.

According to Aquila, operators can anticipate savings of up to 30 percent of a server’s power budget, due to the elimination of all server fans. This normally stranded energy can be used to power more servers, further adding to overall energy effectiveness.

The company’s president Judy Beckes Talcott says the US DoE-funded Sandia and NREL are driving innovation in liquid cooling and other new efficiency technologies: “We feel that this helps shape the future of HPC and influence modern data center design toward the adoption of liquid cooling and the improving of data center energy efficiency by as much as 50 percent.”