It is widely assumed that data center failures normally have electrical causes, usually (and perhaps ironically) in the uninterruptible power supply (UPS). Batteries in the UPS are a common source of danger, and can lead to more extreme incidents of fires.
Blazing lithium-ion battery rooms feature prominently in data center operators’ nightmares - and rightly so. OVHcloud has yet to share details of the cause of its disastrous fire in 2021, but published reports make it clear that batteries and inverters were at the heart of it.
It appears to have been a lithium-ion battery fire that took out Korea’s widely-used KakaoTalk application earlier this year, virtually bringing the country to its knees.
But recent months have seen what might be a worrying trend: data center cooling fires and failures.
When cooling doesn’t
In July, London suffered its hottest day ever, and data centers were reported to be turning water hoses on their cooling systems to keep cool. But still Google and Oracle suffered outages when data centers overheated, and their cooling systems failed.
In October this year, a cooling system actually caused a fire in China. A backup cooling tower caught fire in the block which houses the data center at China's Suzhou Supercomputing Center, at the Suzhou Industrial Park, a technology and business flagship for the country.
Instances of cooling failures are rare, and these particular events did not cause serious damage or danger. But the sector should be aware that similar incidents are increasingly likely.
Data centers are expected to support higher power density as workloads increase. This means that more heat has to be removed, so cooling systems are now operating under higher demands.
At the same time, climate change is producing conditions which add to the load. If outside temperatures are high, then cooling systems have to work harder.
Meanwhile, as with all aspects of data center operations, cooling systems are under pressure to save energy. This reduces their environmental impact, but also saves on costs, at a time of increasing energy prices.
So there are multiple incentives to allow temperatures to rise - and that could make overheating more likely.
Cooling systems choice
Data centers in temperate countries have been switching away from energy-intensive air conditioning systems, towards the use of free-air cooling and water-based evaporative cooling.
They’ve been able to do this based on existing weather patterns, but rising temperatures might mean that large numbers of data centers aren’t really suited to their local weather, and are vulnerable to cooling failure in future.
The long-term answer is normally held to be forms of liquid cooling, up to and including immersion cooling, which remove heat more efficiently, but these will require large scale changes in data center building practice.
Meanwhile, watch out for cooling failures to climb the rankings of causes of outages.