Data center operators are increasingly looking to new cooling solutions to ensure their servers function effectively and efficiently.
High-performance computing (HPC) or artificial intelligence (AI) workloads can take days of high-utilization runtime to complete a single set of complex simulations. A lack of adequate cooling for the hardware running these workloads can cause unreliability, failures, and reduce the overall performance of the server system, which can also add runtime to the workload. This means the overall cost to run that workload, or the total cost to run the server during its lifespan, will increase.
Another example of when non-traditional cooling is required is during overclocking. By overclocking a processor you increase the clock speed, which allows the CPU to execute more instructions per second. This has typically been used for gaming systems but is now also used extensively in areas where fast, low-latency computing is required, like electronic trading.
Overclocking increases the performance of computer components but at the expense of additional power and thus additional waste heat. Without the right cooling, the components simply can’t cope with the additional heat and fail, possibly rendering the increased performance useless.
To solve these challenges, liquid cooling is often the answer.
What are the different types of data center liquid cooling?
The three most common types of liquid cooling are immersion cooling, rack-level liquid cooling, and self-contained liquid cooling.
Immersion cooling involves placing all computer components inside a specialist non-conductive liquid, usually associated with an oily quality. There are two types of immersion cooling, single-phase and two-phase.
- Single-phase works by actively pumping the liquid over the heat source, absorbing the heat, and then circulating the liquid to a heat exchanger to be cooled again.
- Two-phase works by the liquid maintaining a low boiling point. The heat from the components then boils the liquid on contact, and this phase change transfers the heat away from the component being cold. The gas bubbles rise to the top of the tank where they are then condensed back into liquid form. The condensing process removes the heat from the coolant.
Immersion cooling requires custom hardware, non-typical rack space, and data center customization. It also requires a full stack to be defined with very few maintenance expectations, as it takes a lot of work to add a card or disk drive after the system is already deployed. The cost of deployment of immersion cooling is usually significant in comparison to the other methods.
However, the benefit of immersion cooling is that it provides a higher efficiency of heat removal compared to traditional fan-based systems. Liquid coolants are much better conductors than air and require less energy input to circulate.
Rack-level liquid cooling is the process of turning the entire data center rack into an extensive liquid cooling loop. A large portion of the rack will be dedicated to pumping and cooling the liquid coolant. The remainder of the rack will take server systems that have their main heat-generating components (CPU, RAM) fitted with water blocks or cold plates that pipe coolant over them. These systems then route the pipes to the rear of the chassis where they have quick disconnect valves.
This allows the server's internal cooling loop to be connected and disconnected from the main rack-level cooling loop. Rack-level liquid cooling provides very high-density computing, as you’re able to stack lots of hardware together as the cooling infrastructure is somewhat abstracted away. However, this requires an entire rack-level solution, and depending on the complexity of the deployment, can increase costs significantly over typical server deployments.
Lastly, self-contained liquid cooling involves incorporating all the necessary liquid cooling hardware entirely within the individual server chassis. This means having specific computer components, i.e. the processor, have liquid coolant pumped over them to transfer the heat via a water block or cold plate. This coolant is piped over key components or heat sources and then returned to a radiator cooled by internal fans like a traditional server configuration.
Self-contained liquid cooling means there are no additional hardware or infrastructure requirements for the data center. Traditional racks can be used, and the server is essentially plug-and-play like a typical air-cooled server, meaning the server has similar maintenance capabilities to standard servers. The cost of deployment for self-contained liquid cooling tends to be closer to that of typical servers than of immersion-cooled systems.
Liquid cooling isn’t just about making servers more efficient
Data center owners are realizing that they can increase efficiency by capturing and recycling waste energy from their existing infrastructure. A new ISO standard for Energy Reuse Factor (ERF) is being implemented to help data centers measure their performance on energy reuse and increase sustainability.
Microsoft and Google have both begun work on heat reuse projects in Finland, the former working with Fortum, saying “the waste heat produced in the datacenters will be converted to district heating, serving Finland’s second largest city Espoo and neighboring Kauniainen, and the municipality of Kirkkonummi, in what to date will be the world’s largest scheme to recycle waste heat from data centers.”
Meanwhile, Google is working with Haminan Energia to reuse the heat from their existing data center, which will “represent 80 percent of the annual heat demand of the local district heating network.”
The UK is also trialing similar schemes with the energy supplier Octopus recently investing £200m in Deep Green to heat nearby swimming pools with waste DC energy.
The key constraints on data centers
Specifically in industries such as finance, there’s been a trend away from the cloud, which was typically an initiative based on cost, and back to co-located data centers closer to or hosted by trading exchanges – this has been driven from both a performance and control perspective. As with any technology it’s about the 'right tool for the right job.' Cloud and remote data centers work well for certain sectors and projects, but there will always be a need for high-performance hardware with close physical proximity to specific locations.
How AI, regulatory pressures, and workloads will impact the pace of liquid-cooling adoption
AI and other HPC sectors are continuing to drive up the power density of rack-mount server systems. This increased computer means increased power draw, which leads to increased heat generation. Removing that heat from the server systems in turn requires more power for high CFM (cubic feet per minute) fans.
Liquid cooling technologies, including rack-level-cooling and immersion, can improve the efficiency of the heat removal from server systems, requiring less powerful fans. In turn, this can reduce the overall power budget of a rack of servers.
When extrapolating this out across large sections of a data center footprint, the savings can add up significantly. When you consider some of the latest Nvidia rack offerings require 40KW or more, you can start to see how the power requirements are shifting to the extreme. For reference, it’s not uncommon for a lot of electronic trading co-locations to only offer 6-12KW racks, which are sometimes operated half-empty due to the servers requiring more power draw than the rack can provide.
These trends are going to force data centers to adopt any technology that can reduce the power burden on not only their own infrastructure but also the local infrastructure that supplies them.
Additionally, any method to increase efficiency, whether by reducing overall load or reusing waste heat, will be critical to maintaining operational efficiency while scaling to meet increased computing demands. Many may start to look at building new bespoke HPC or AI-focused data centers, with a ground-up focus on these new requirements.
Location will also continue to play a big factor in new data center construction, as access to green energy and a favorable climate will be new key factors to consider.