When, in December 2022, chemical giant 3M announced it would stop making per- and polyfluoroalkyl substances (PFAS) by 2025, it marked the beginning of the end for Novec dielectric fluids - and the many other products commonly used in liquid cooling applications. Since then, 3M has accelerated plans to shut down production of these “forever chemicals” in response to mounting legal liabilities associated with environmental and public health risks.
In the grand scheme of things, IT cooling is a trivial issue compared with the criticality of PFAS in other industrial applications: for example, they are essential in the fabrication of semiconductors. But 3M’s decision had major implications for those in the data center sector who had bitten the bullet to make two-phase immersion cooling work. Now, with runaway silicon thermal power, two-phase cooling has garnered substantial interest again, but in a different form: direct-to-chip liquid cooling.
Boiling down two-phase immersion
Arguably, two-phase immersion is the embodiment of a perfect cooling system:
- It captures virtually all the heat load because the IT equipment is completely immersed in a coolant.
- It offers high cooling performance, due to the rapid evaporation of a boiling dielectric fluid (benefiting from the latent heat of vaporization).
- It does not require pumping or any other form of forced convection. This sets it apart from both single-phase immersion (where dielectric fluid does not evaporate) and water cold plates that rely on pumps for circulation.
Better still, the cooling infrastructure built around two-phase liquid cooling systems are lean because compressor-free heat rejection is possible in virtually any climate (when a liquid with the right boiling point is selected). This promotes energy performance, lower capital costs, and better reliability - in theory.
Yet, the technique has not reached mass adoption in data centers and is trailing behind single-phase immersion both in interest and investment. In practice, two-phase immersion was saddled with complications, even before 3M’s decision to quit making low-boiling point dielectric fluids:
- Costly fluids. Fluids for two-phase liquid cooling can cost up to 10 times as much as dielectric fluids for single-phase immersion cooling, often more than $50 per liter, with large variations depending on supplier and volume discounts. This means filling up a full-sized two-phase immersion tank (40 rack unit equivalent space) costs upwards of $20,000.
- Vapor loss. Volatility means a substantial loss of fluid through vapor leaks, one of the intractable issues when operating open (non-sealed) baths. Cost is not the only issue; topping up the tanks regularly is an additional maintenance task for the operations staff.
- Serviceability of IT. Accessing IT hardware submerged in boiling dielectric means an entire tank needs to either power down or go into a deeply throttled performance state to stop the boiling and minimize vapor loss during the servicing.
- Restrictive IT hardware compatibility. The single biggest hurdle that has prevented wider adoption of two-phase immersion is not direct cost or operational issues, but material compatibility. Without a stringent control of IT hardware materials in the supply chain, hardware component failure rates will likely be abnormally high. Two-phase immersion has had some success with cryptocurrency mining because it uses a narrow set of specialty hardware.
- Lower floor space performance density. Cold-plate systems, particularly when using very tall racks, can achieve much higher performance density for a given floor space compared with immersion cooling. If, however, the immersion tanks can be stacked, which would require novel facility designs to handle the structural loading of several tanks weighing up to three metric tons each when fully loaded, this could be an option to increase the application’s performance density.
The decision by 3M to pull the plug on its two-phase immersion fluids was by no means the only or, arguably, even the primary reason two-phase immersion development projects are now either running several years behind or being shut down.
To make hundreds (and sometimes thousands) of boiling dielectric tanks viable in a data center was never going to be easy, making it all the more attractive to those operators keen to gain a competitive edge in infrastructure performance — a prime example being Microsoft.
For at least the next few years, two-phase immersion for mainstream IT applications remains a lab project. At the recent IEEE ITherm 2024 conference, there was no discussion of two-phase immersion in production deployment, with only one presentation on an in-development two-phase fluid by chemical company Chemours.
The next phase: chip evaporators
Direct-to-chip water cold plates remain the most common form of liquid cooling for IT. They represent the most tried and tested approach with a wide range of applications. The adoption of single-phase immersion is gathering speed too, promising to help displace air cooling for a wide range of IT infrastructure, as well as complementing cold plates. The future of data center cooling would be simple enough, if only silicon thermals would comply.
In 2023, the IT industry accelerated its pursuit of higher compute performance required to train ever larger generative AI models. Suddenly, high-end chip thermal design power (TDP) leapt from 400W to 700W, and the next generation of Nvidia accelerators are expected to reach 1,200W in 2025.
This creates an opening for two-phase cold plates or, more accurately, vaporizers. While water cold plates and single-phase immersion are still able to keep pace with the TDP escalation, they require dramatically higher flow rates. This means additional pumping power is required and/or lowered coolant supply temperatures, which can either make heat rejection considerably more expensive compared with a design optimized for free cooling or the prospect of heat reuse unattractive.
What really boosted the credibility of the two-phase approach is public endorsement from Nvidia. Silicon thermal engineers and senior data center technologists at the chip design firm consider two-phase direct-to-chip cooling the preferred solution to solve the thermal challenges of the next generation of data center accelerators to maximize performance. From comments made by Nvidia (and other vendors), it is clear this is not the end of silicon power escalation, as complementary metal oxide semiconductor (CMOS) technology is pushed to its limit.
When circulating a low boiling point liquid to the chip in a closed loop, the problems that fetter two-phase immersion either do not exist or are more manageable. The amount of liquid used is typically a small fraction (as little as one-tenth, at a couple of liters per chip), there is little vapor loss, servicing IT systems is easier, and material compatibility issues are relatively easy to manage.
Cooling performance is retained, enhanced even, where it matters most: high-performance computing chips. By taking advantage of the latent heat of vaporization with low boiling point liquids (typically between 18°C and 60°C / 65°F and 140°F at 1 standard atmosphere, atm), the cooling performance for a given flow rate is improved, compared with the single-phase convection cooling.
The size of improvement in heat flux performance will depend on whether pump-assisted “flow boiling” (combining convective and nucleate cooling) or “pool boiling” (dominantly nucleate cooling) is used, as well as the thermodynamic properties of the liquid, chiefly boiling point and latent heat of vaporization. An inherent feature of two-phase cooling is handling hot spots on a chip (areas of extreme transistor activity) as nucleation speeds up in locations of higher heat flux.
An additional operational benefit of engineered fluids compared with water in a cold-plate system is easier maintenance of coolant quality: scaling, corrosion or microbial growth are not normally possible, although dissolved substances can still appear.
What also makes two-phase direct-to-chip cooling a credible option is that there are now multiple vendors actively commercializing two-phase on-chip evaporators for IT applications. Key examples are start-ups ZutaCore (formed in 2016) and Accelsius (spun off from Nokia Bell Labs in 2022), and electronics manufacturer Celestica. These vendors all have different approaches to solving the thermal challenges of high heat flux chips using various two-phase liquids and system designs.
Future Uptime Intelligence reports will explore the main approaches to two-phase direct-to-chip cooling and analyze the trade-offs involved, including their respective availability through IT system vendors and other commercial channels.