In this data-centric era, where an immense volume of data is generated and processed every second, data centers hold a pivotal role in shaping our society.
As highlighted in the McKinsey report, the US data center market is forecasted to experience a robust annual growth rate of 10 percent in the coming years. This surge is propelled by factors such as the expanding adoption of cloud computing, the rising prevalence of edge computing, and the increasing emphasis on data-driven decision-making in both business and technology domains, especially given the emergence of AI and generative AI.
Adoption of liquid cooling at data centers is an inevitable trend
The processing of data generates significant heat at chip and server levels, necessitating advanced cooling technology for seamless data center operation. Presently, 80 percent of data centers employ air cooling, which constitutes 40 percent of the total energy consumption within data centers.
Beyond high power usage, water consumption is another operational concern. In most air-cooled data centers, hot air from IT racks is expelled via evaporative cooling at central cooling towers. For instance, an average Google data center consumes 450,000 gallons of water daily, roughly equivalent to 0.7 Olympic-sized swimming pools. Comparable water usage scales are reported by other major tech companies, such as Meta, which withdrew about 5 million cubic meters of water in 2021, equivalent to 2000 Olympic swimming pools.
In addition to environmental considerations, the broader adoption of AI amplifies the urgency to move away from air cooling. Meeting the demands of parallel processing requires higher computing power and rack density.
Presently, data centers operate at a power density of around 8 -10kW per rack, but this is expected to soar to 40-100kW for AI-ready racks equipped with power-hungry GPUs. The thermal design power (TDP) of processors has increased fivefold over the past decade, with this trend expected to continue. As rack power density and heat generation rise, traditional air cooling becomes impractical, with an upper limit of effectiveness at approximately 20kW per rack. Beyond this threshold, liquid cooling, whether through direct-to-chip or immersion methods, becomes imperative for efficient heat dissipation, offering a more energy and water-efficient solution.
After the 2023 boom in generative AI, major players in the data center space have all made announcements around liquid cooling adoption, and all chose the cooling solution of Direct-to-Chip, often in conjunction with their AI chips and AI-centric data center development.
Data center liquid cooling technologies
Direct-to-chip and immersion cooling are the two primary liquid cooling approaches for removing heat from IT infrastructure in data centers. The direct-to-chip cooling solution relies on circulating a fluid through a cold-plate heat exchanger situated directly on the computer chip. The heat emitted from the chip is absorbed into the coolant loop. This method does not significantly alter the form factor of servers and racks, allowing for easy retrofitting into existing air-cooled data centers.
On the other hand, immersion cooling entails submerging servers in a dielectric, electrically non-conductive fluid. Immersion can occur either at the IT chassis level or at the rack level. When performed at the rack level, the rack is typically configured horizontally as an immersion tank rather than the conventional vertical setup. Due to this distinct rack configuration, immersion cooling is more suitable for greenfield projects and typically necessitates the implementation of new handling protocols and equipment.
Both direct-to-chip and immersion cooling systems can be subcategorized into 1-phase and 2-phase configurations. In a 1-phase system, the coolant consistently maintains a liquid state throughout the cooling process. Heat is absorbed by the liquid coolant, which subsequently carries the heat away from the source. On the contrary, 2-phase systems involve a dynamic process where the coolant undergoes a phase change from liquid to vapor at the heat source and then condenses back to a liquid state in the condenser. This phase change helps enhance cooling efficiency.
In addition to the standard configurations of direct-to-chip and immersion cooling, cooling solutions developers and chip designers are also pushing cooling solutions to be reduced in size and integrated directly into chip packages.
As part of ARPA-E's COOLERCHIPS program, Nvidia and HP are jointly working on advancing embedded microfluidic cooling technology at the silicon wafer level. Successful validation of this approach could pave the way for seamlessly embedding the cooling solution within the chip architecture and intrinsically improving the chip's cooling performance.
Potential roadblocks to data center cooling startups
Despite the burgeoning demand for data center cooling solutions, startups in this field do face potential roadblocks to their scale-up.
Firstly, the fragmented nature of the value chain within data centers poses a challenge for startups in identifying the key decision maker for the broader adoption of cooling solutions. Additionally, the bespoke nature of data center designs can impede the seamless scalability of cooling solutions across all the facilities.
Another challenge arises from competition with established industry players. Large chip manufacturers, in particular, are actively pushing the boundaries of cooling solutions. For instance, Nvidia is developing chassis-level immersion cooling utilizing phase-change refrigerants. Meanwhile, Intel is exploring 3D vapor chamber cavities integrated into coral-shaped immersion cooling heat sinks and fluid jets that can be directly integrated into chip packages.
These industry giants wield superior capital and resources, allowing them to advance technology swiftly and, crucially, scale up production rapidly to meet the escalating demand, creating a strong competitive barrier for emerging startups.
What does the King of Hill in data center cooling look like?
- Exceptional cooling performance, showcased through technical metrics such as max thermal design power (TDP) supported, heat flux, and rack density enabled.
- High Affordability, evident in the total cost of ownership of the product, encompassing both CAPEX and OPEX of the solution, and also the cost saving achieved through the enhancement of power usage effectiveness (PUE) and water usage effectiveness (WUE).
- High Serviceability, indicated by the product’s ease of installation/integration and maintenance.
- High Manufacturing Readiness Level (MRL), qualified by companies’ manufacturing capabilities and supply chain readiness of components and coolant. A recent example is the phasing out of 3M Novec coolant due to PFAS regulations, which has created a significant supply chain challenge for many developers of 2-phase cooling solutions.
- Robust IP defensibility is crucial for differentiating and safeguarding the technology design, particularly in a space with a low barrier to entry and intense competition from both startups and incumbent players.
At TDK Ventures, we believe that the evolution of AI, particularly the rise of generative AI, will catalyze exponential growth in the compute power required, necessitating the broader adoption of liquid cooling.
In this dynamic landscape, both incumbents and startups are actively driving advancements in liquid cooling, ranging from system configuration to the development of novel coolant materials.
We are actively looking for cooling solution developers who meet our 'King of the Hill' criteria, demonstrating not only excellence in technology but also the agility to scale rapidly in response to the escalating demand, positioning themselves strategically for potential exits in the coming years.