As chip, server, and rack densities continue to escalate, the demand for more effective cooling solutions has become one of the top priorities when it comes to digital infrastructure.

Over the last few years, liquid cooling methods have made a significant advancement in efficiently dissipating heat, especially for high-performance computing (HPC) and artificial intelligence (AI) applications.

With one in five data centers already adopting liquid cooling and an additional 61 percent considering its adoption, it's evident that this technology is becoming mainstream, and hybrid cooling systems are on the rise.

However, challenges persist, particularly in implementing systems reliant on external cooling mechanisms positioned outside the server rack, necessitating the circulation of coolant through a network of pipes to reach the heat-generating components.

Direct-to-chip liquid cooling emerges as a compelling solution. This innovative approach involves interfacing directly with heat-generating components at the chip level, promising unparalleled thermal efficiency and targeted heat dissipation.

At its core, direct-to-chip liquid cooling entails the direct application of a coolant, typically water or a dielectric fluid, to the heat-generating components of computing hardware.

This process integrates specialized cooling components, such as cold plates or immersion systems, directly onto the surface of central processing units (CPUs), graphic processing units (GPUs), or other high-power components.

These cooling solutions come in various forms, including single-phase cold plates and two-phase evaporation units - each has unique benefits in terms of thermal performance and compatibility with different hardware configurations.

Technical considerations and benefits

Direct-to-chip liquid cooling offers several key technical advantages. By interfacing directly with heat-generating components, liquid cooling systems can maintain more uniform and lower operating temperatures, thereby enhancing the reliability and longevity of critical hardware.

Moreover, liquid cooling enables data centers to support higher compute densities without risking thermal throttling or overheating, thus unlocking new levels of performance and scalability.

A primary benefit of direct-to-chip liquid cooling is its exceptional thermal efficiency. Liquid cooling systems can achieve high heat transfer coefficients, resulting in more efficient heat dissipation.

This increased efficiency translates into lower operating temperatures and reduced cooling energy consumption, leading to substantial cost savings and environmental benefits.

By eliminating the need for intermediary heat exchange mechanisms, direct-to-chip liquid cooling minimizes thermal resistance, enhancing heat transfer efficiency and enabling more precise temperature control at the chip level - setting it apart from conventional liquid cooling methods.

Furthermore, the proximity of the coolant to the heat source allows rapid heat dissipation, mitigating the risk of hotspots and enabling the handling of higher heat densities with greater ease.

Moreover, direct-to-chip liquid cooling offers greater flexibility in system design and deployment. It can be seamlessly integrated into existing server designs, minimizing disruptions to operations and streamlining the deployment process.

Challenges and implementation considerations

Secondary fluid networks, such as the use of water to cool the data center is not new - the data center cooling industry has been deploying chilled water solutions for years.

Chilled water, as well as refrigerant, has been running under data center-raised floors since the development of the first minicomputers. What has been less common, is fluid direct to IT racks. This is where deploying high-density racks will face new challenges.

In the early days of HPC, cold plate technology used to deliver cooling liquid directly to the chip, using micro-channels in the 100µ range. For the last few years, this has dropped to within 50µ and now, with the latest generation of GPUs, cold plates are using 27µ micro-channels. This means we have to filter particulates at 25µ. So when deploying direct-to-chip liquid cooling, great care must be taken in ensuring the secondary fluid network is microscopically clean.

This throws up a number of other challenges:

What fluid should be used and how do you ensure it’s clean and free from contaminants?

The main contenders are purified water, PG25, or EG25. All have their pros and cons: anti-corrosion properties, testing, initial costs, replacement costs, storage, and handling.

What materials should the secondary fluid network be made from?

The debate between metal and plastic is very much the hot topic in this space. Stainless steel is the leading contender. It offers many benefits over plastic but is costly and requires specialist manufacturing. Plastic is lower cost and easier to work with, but there are concerns over its strength, particularly when there are rapid changes in flow.

How do you prevent contamination from entering the secondary fluid network when installing equipment from day one to regular equipment maintenance?

This is where the industry will have to adapt the most. In an air-cooled data center, installing a server or other equipment is very simple. Once the initial setup has been completed, usually in a test area, it’s just a case of finding a space, connecting the power and network, and you are done. This is virtually the same procedure if you are installing one or 1,000 servers.

For direct-to-chip servers, although very similar - just one additional connection - the entire process is more complicated.

The test area needs to have the ability to offer a fluid connection to servers before it’s installed in the main data center. When installing a single liquid-cooled server, care needs to be taken to ensure there is enough top-up fluid in the CDUs and that air is bled off correctly.

When installed multiple items of equipment, especially for new systems, additional fluid, and air bleeding will need to happen. Also, for larger installations multiple flushes and regular filter cleaning must be undertaken to clear out debris and contamination from construction.

How do you perform integrated system testing?

Today, there are no large-scale load banks available to test direct-to-chip installations. Many CPU and GPU vendors use a TTV (thermal test vehicle) to simulate the power/heat load of a chip, which can be expensive.

Exploring the future of direct-to-chip liquid cooling

One area of particular interest is the development of advanced coolant formulations tailored to the specific requirements of modern computing hardware.

These next-generation coolants offer enhanced thermal conductivity, improved corrosion resistance, and reduced environmental impact, paving the way for even greater efficiency and sustainability in data center operations.

The integration of artificial intelligence and machine learning algorithms have the potential to improve the management and optimization of high-density cooling systems.

By leveraging real-time data analytics and predictive modeling techniques, data center operators can dynamically adjust cooling parameters and optimize performance based on workload demand and environmental conditions, further enhancing efficiency and reliability.

In addition to technical advancements, the adoption of direct-to-chip liquid cooling is driving innovation in data center design and architecture. Modular cooling solutions, scalable infrastructure designs, and flexible deployment options enable data center operators to adapt quickly to evolving workload requirements and business needs, facilitating rapid deployment and expansion of high-density cooling capabilities.

The trajectory of direct-to-chip liquid cooling represents a pivotal shift in data center thermal management, offering unprecedented levels of efficiency, performance, and scalability.