The demand for more efficient data center cooling methodologies has become increasingly urgent due to the rise of generative AI and high-performance computing, propelling liquid cooling to the forefront as a transformative solution.
While the strategic circulation of coolant to dissipate heat from electronic components is by no means new, it has evolved to bring cooling solutions closer to specific components within servers and racks.
These innovative approaches aim to redefine thermal management for modern data infrastructure as workload densities increase. However, the implementation of liquid cooling solutions is not without its complexities and challenges.
Navigating these liquid cooling design challenges requires a nuanced understanding of the specific facility where it will be deployed and a commitment to problem-solving. In this article, the data center experts from Align – the leading global provider of technology infrastructure solutions – discuss the major challenges of liquid cooling and offer key considerations gleaned from first-hand experiences.
“As technology leaders in the field embrace liquid cooling, what they are really doing is embarking on a journey of exploration and experimentation. Although best practices have yet to be defined, deployments of various liquid cooling solutions are on the rise. Considering the various solutions available only increases the confusion around what is best for each design intent,” said Tom Weber, Align’s principal of Data Center Design and Build.
“Many existing facilities were not built with liquid cooling in mind, and certainly not with bringing liquid into the IT white space,” added Simon Eventov, managing director of Data Center Design and Build.
Additionally, liquid cooling is characterized by a lack of uniformity and standardized practices. Ultimately, the choice of liquid cooling solutions will be steered by the requirements of the server and chip manufacturers, and a one-size-fits-all approach may not work in every environment.
As higher-performance chips are being developed, new requirements will mandate liquid cooling.
"We know people will eventually have to move to liquid, but we also hear feedback from customers that that's challenging. For them, their data centers aren't there, they need to build new data centers," said Nvidia’s Charlie Boyle.
There are many liquid cooling solutions currently on the market and in development, which will change how future data centers are designed and operated. For now, the focus of this discussion is on rear door heat exchangers and direct-to-chip liquid cooling (DTC).
This article will concentrate on these more mainstream solutions. While technologies such as immersion or submersion are viable, they are not being widely deployed.
Rear door heat exchangers
“Rear door heat exchangers (RDHx)” is a method where a localized cooling unit is attached to the rear door of individual server racks. A coolant flows through a heat exchanger, absorbing heat from the exhaust air generated by the server equipment within the rack. The heated coolant then transfers the absorbed heat away from the server rack. The coolant is then cooled and recirculated, providing efficient cooling for individual racks within a closed-loop system.
This cooling method allows precise heat dissipation, focusing on specific racks where cooling is needed the most, which enhances data center efficiency by effectively managing heat at the rack level. RDXh does not require any modifications to the IT equipment itself and is therefore a less complicated approach to higher density requirements.
Direct-to-chip (DTC)
Direct-to-chip cooling currently dominates the conversation, given that it is seen as an inevitable requirement in the near future. DTC is a targeted approach that involves the circulation of coolant directly to the chip. Within the “DTC” method, two types exist: single-phase and two-phase.
Single-phase entails the use of a single-phase liquid coolant, typically water or a specialized fluid, directly applied to the chip's surface. The coolant absorbs the heat generated by the processor, carrying it away efficiently. Single-phase DTC is recognized for its simplicity and effectiveness. Taking the concept a step further, the two-phase DTC cooling method introduces a more advanced mechanism.
In this approach, the coolant undergoes a phase change – transitioning from liquid to vapor – as it absorbs heat from the chip. This phase change enhances the cooling efficiency, as the vapor carries away more heat energy. The condensed coolant then returns to its liquid state to repeat the cycle. The two-phase method is praised for its enhanced cooling capabilities, making it an attractive option for scenarios where high thermal loads demand a more sophisticated solution.
“The benefit of DTC cooling is that it enables chips to operate at their most efficient processing rate, which is of course, important for keeping up with the increased demand from AI. It also allows for the potential of overclocking the chips. And in legacy facilities, depending on existing infrastructure, you can tap into chilled water loop or internal heat rejection systems as well, said Rodney Willis, VP of sales and sourcing for Data Center Design and Build.
DTC is not without its challenges. In many cases, it requires modification to manufacturer-certified equipment which voids their warranties and requires working with a third party to implement and warranty the modified product. In addition, installing manifolds within server racks which are already dense with power and low voltage cabling is challenging without upsizing existing rack specifications.
It is important to note that these different solutions are not mutually exclusive. We often see DTC cooling coupled with RDHx to handle the residual heat with air cooling.
A word about cooling distribution units (CDUs)
Cooling distribution units or CDUs are a critical component of either liquid cooling approach discussed in this article. A CDU is an enclosure that can be a standalone floor-mounted unit, or rack mounted into a server rack. In a way, the CDU acts as the brain of the operation for liquid cooling technology. The CDU houses the controls needed for the liquid cooling system and provides filtration, temperature, and flow control for both single-phase and two-phase RDHx and DTC solutions. Choosing the right CDU solution is an important piece of the puzzle when planning to implement a liquid cooling solution.
Key challenges of liquid cooling
“It is virtually impossible to discuss liquid cooling without talking about the challenges it poses. With all the benefits and excitement, we know where there is a will there is a way, but we shouldn’t pretend that there is a simple one-size-fits-all standard that exists today,” said Weber.
The core challenges the Align team has seen are time to deployment, expenses, complex maintenance requirements, risk mitigation, and space considerations.
Within the data center space, the more specific challenges in implementing liquid cooling are spatial design considerations, plumbing/chiller line modifications, server internals and connections, unique maintenance needs and support training, ambient air cooling, cabinet requirements, weight loads, warranties and slab/raised floor considerations.
"Considering the time to deployment versus lead times to build ideal infrastructure– what if rear door heat exchangers take 28 weeks to manufacture, but you need to deploy in 14 weeks? If you opt for chilled water cooling, can you build one in your timeframe? If not, selections must be based on availability,” said Eventov.
The installation process is not only time-consuming but also comes with substantial equipment and labor expenses, potentially leading to downtime during deployment. “Existing building infrastructures lack the necessary framework for seamless integration, forcing decision-makers to weigh the efficiency gains against the costs of implementing liquid cooling solutions,” said Willis.
Maintenance becomes a critical aspect, requiring frequent attention and integrated staffing plans. Engineers and IT staff need to work closely together due to the direct connection between the cooling method and the equipment.
It is also important to consider ongoing maintenance and support needs, especially after the initial setup ("day two" operations). While water-based solutions will require a lot of maintenance and cause downtime, dielectric two-phase cooling is more efficient and reliable.
“But this implementation comes with its own challenges, for example, by adding liquid cooling you’re adding infrastructure for the delivery of chilled liquid and removal of warm liquid which could impact cabinet requirements and selection,” said Willis.
Companies must also mitigate risks associated with routing liquid within the data center, whether it is below a raised floor or above the cabinets in a slab floor environment. For example, with AI emulators weighing up to 4800 lbs., this weight can be too heavy for some raised floor environments. Ideally, you would route coolant piping from under the cabinet instead of above. In a slab floor configuration, the addition of overhead coolant piping would add significantly to the support and weight load on the ceiling grid.
“Once a solution has been selected and materials have been ordered, we feel it is critical not to skip important planning steps prior to deployment. The Align team immediately begins to work on BIM coordination models, structural engineering reviews, and careful coordination with other overhead or underfloor components to reduce surprises and change orders in the field” said Eventov.
And finally, current space and infrastructure is a factor. “Liquid cooling does not 100 percent replace air cooling – you still need supplemental cooling within the facility for other electrical components/general cooling of the environment. The point is to retrofit these systems to work together,” said Weber.
While all the above considerations are important to consider for the data center space, we have not even scratched the surface of other considerations such as:
- What are the spatial design impacts of adding liquid cooling?
- Wider aisles and cabinets
- Reduced server rack capacity
- How complex are the plumbing/chiller line modifications?
- Is the liquid cooling manifold compatible with the server internals and connections?
- Are the pumps and heat exchangers redundant?
- Are they on their own UPS-backed Power systems?
- Are the servers already integrated with specialized microchannel and micro-convective liquid cooling solutions?
- Will the server OEM only support a particular liquid cooling solution?
Getting started with liquid cooling
The feasibility of liquid cooling ultimately hinges on meticulous planning and integration with existing infrastructure.
“It’s daunting, but there’s yet to be a circumstance the Align team hasn’t been able to work through,” said Eventov.
“And from what we’ve seen, collaborating with partners like Align early in the process ensures that budgetary, timing, and infrastructure considerations work together seamlessly, paving the way for a successful implementation,” said Willis.
The team at Align is prepared to address diverse and complex scenarios. Our thorough grasp of liquid cooling technologies and over 35 years in the industry enable us to devise unique solutions tailored to your facility's needs.
You can learn more about Align and its expertise at www.align.com, and contact us to speak to a member of our team today.