As anyone who works in or follows the data center industry knows, increasing demand for artificial intelligence (AI) and machine learning (ML) applications is fueling investments in data centers around the world. AI and ML are also driving data center managers to reconsider their cooling designs to handle the high heat loads of next-generation chips.

Data center managers have to design their infrastructure to support high heat loads while still being able to scale their operations to meet demand.

Scaling capabilities cannot always depend on more physical space – data center managers and engineers often have to solve the technical problem of fitting more and hotter servers into the same spaces. They also have to maintain 24/7 uptime: the needs of AI applications will not pause for a data center renovation.

Additionally, the industry is facing increased scrutiny over power use, so data center managers need to be especially conscious about how they are using electricity.

Sustainability has always been a conversation in the data center industry, but this increased attention will create even more conversation around power usage effectiveness (PUE) and power management.

Turning to liquid

To support these considerations and the demands of high-performance chips, data centers have been turning to liquid cooling. By using liquid cooling technologies in the right way, data center managers can greatly improve PUE, even in applications where they are using next-generation IT.

Liquid cooling is a spectrum of technology ranging from using chilled liquid lines to supplement the performance of air cooling to completely submerging equipment in non-conductive liquid. Liquid cooling is an effective cooling technique because liquid provides a much greater heat transfer capacity than air. It can also be pumped closer to the source of heat, capturing, and transporting heat out of the system from the point at which it is generated.

Liquid cooling can help data centers increase capacity while maintaining efficient space and energy use. It also can offer a favorable return on investment and lower the total cost of ownership for data center facilities. Liquid cooling systems provide an effective solution for achieving the required temperature parameters and reducing the energy consumption of cooling systems.

Liquid provides a much greater heat transfer capacity than air. This helps liquid cooling increase power usage effectiveness, managing heat loads effectively, reducing energy costs, and contributing to environmental sustainability.

The heart of a liquid cooling system

For data center managers designing liquid cooling systems, selecting the right coolant distribution unit (CDU) is paramount. CDUs pump cooled fluid to racks and chips in a closed loop, with precision control adjusting fluid temperature and flow rates to maximize efficiency.

Hot or cold liquid circulates through hoses and manifolds to reach IT equipment, then back to the CDU where it is cooled using a line of facility water and recirculated.

The entire system is a completely closed loop, so risks of leaks or liquid coming into contact with electrical infrastructure are minimized. However, reliable liquid systems are a requirement to minimize the risk of leaks.

CDUs are at the core of driving the efficiencies that liquid cooling can bring to data centers, so choosing the right one is critical. While CDUs may all offer similar features and benefits, data center managers need to look beyond the obvious.

Examine the testing-based performance of the unit versus theoretical projections

To meet the increasing demand and to accommodate high-pressure drops in the cooling system loop in/across the IT racks, CDUs are often pushed for the maximum thermal and hydraulic performance possible, and rightfully so. However, due to the physics of flow, hydraulic and associated thermal losses can start to creep in with high flow velocities within the plumbing and heat exchangers.

Engineering tools such as network modeling, HX selection software, computational fluid dynamics, and digital twins are great resources when selecting the optimal components and building prototypes. However, testing the complete CDU system to the top end of limits across the range of applications where it will be deployed is where the rubber hits the road.

This testing will expose the nuances in the efficiencies of the components and complete the system under extreme flow conditions. The hydraulic and thermal performance can degrade hugely and can affect the rating of the unit from prototype to testing, so data centers must work with vendors who are equipped to test and customize CDUs to their needs.

Suitability of critical CDU parts for the specific application

Data centers are specific applications and need to be serviced with units that are truly designed with that in mind. A small leak in residential and industrial segments may be more forgiving compared to a much higher safety and cost risk around IT server racks packed with sophisticated high-performing chips.

Critical CDU parts need to go through a higher rigor of selection and design verification compared to components in other applications. For pump selection specifically, the material compatibility of all wetted parts across different use case fluids should be based on verification of existing manufacturing processes. Additionally, the hydraulic failure mode of pumps in case of internal damage to components is critical.

Since CDU applications use closed TCS loops, they are sensitive to debris from the early commissioning phase. This debris can compromise the seal integrity of mechanical seal-based pumps, making the pumps susceptible to higher maintenance instances and the associated downtime and cost of replacements. Magnetically coupled pumps are a good alternative for CDU applications.

Reliability for leak integrity

As we have covered earlier, the components and systems in data centers are very leak-sensitive due to the high risk, associated liability, and cost impacts of any potential leaks. Every joint in the whole facility water system and technology cooling system loops represent potential failure points if not vetted through thoughtful and reliability tests.

The learnings from components and system level testing for leak integrity and pressure decay are critical to make CDUs and the entire hydraulic system reliable for longer life of operations.

Designing for manufacturability

There are different design solutions for a given data center application. However, a solution that is designed to keep manufacturability in the main path of the design process will prove the most beneficial to meet the short and long-term quantity needs of our industry.

As AI and ML create more demand for data centers, the industry needs to be able to produce different types of products in a wide range of quantities. Manufacturability intent keeps the design of equipment and components accountable to production methods.

A great theoretical design is not worthwhile if it cannot be manufactured or scaled.

Serviceability of field replaceable components and true cost of ownership

CDUs are electro-mechanical devices and need to be maintained as such. However, improved serviceability comes not only from how well a particular component fits within a system to start with but also from how friendly the design is to maintain and service. This includes replacing certain parts within the unit during its lifetime.

When designing CDUs, a list of such parts should be reviewed and evaluated for a true cost of ownership. As discussed earlier, if critical components such as pumps are not selected for the application and have obvious failure modes, there will be more service requirements and hence cost of ownership will be higher during the unit’s course of operation.