Fire in data centers is catastrophic, as OVHcloud found out earlier this year. It's also quite rare: when Dennis Cronin, of the DCIRN network found, when he checked recent news coverage, he found only 31 data center fires reported in the last 18 years.
That may be 18 fires too many - but the reason the figure is that low is because this industry really has established solid good practice around preventing, suppressing and mitigating any fire that does break out or looks likely to. And it's all about having multiple lines of defense, not just against the fire itself, but against the systems you've put in to deal with it.
Hold the sprinklers!
Your sprinkler system can put the fire out, but at the cost of your IT kit. Once you have sprayed water on the systems, they aren't going to be much good for anything. So you want to avoid the sprinklers going off, if at all possible.
You do that with fire suppression systems, but even they can be intrusive. Some will flood the data center with non-flammable gas and others will actually reduce the oxygen levels. That can be a problem, because when the oxygen goes, the staff have to get out.
When an air handler malfunctioned in AWS's Frankfurt data center, equipment overheated. Before engineers could fix it. a fire suppression system went off which reduced oxygen levels, and they had to evacuate. The availability zone had an actual outage - and there was no fire in the first place.
Such incidents seem to be about as common as fires. In 2017, Microsoft had an Azure outage in Europe when a fire suppression system went off and shut down air handlers. That time was just during maintenance.
Don't panic though. These are all failures, and have fed into a learning process which should make repeats less likely. Fire suppression suppliers are continually improving their products - and now have nozzles which don't produce those shockwaves. A properly set up fire procedure will work in tiers, with appropriate levels of fire suppression response to each level of alert.
Tiers of defense
Cronin puts it like this in a recent DCD webinar: "You put a fire detection system in to prevent suppression from being activated. You put the suppression system in to avoid a major incident. And you have sprinklers to protect the building. Because if your sprinkler system goes off, due to melting of a sprinkler head, you already have a major incident. You might as well protect the building, so you can do restoration faster."
It starts with the fire detection system, says Adam Pool of Xtralis - the company behind the VESDA aspirating smoke detector system. That has to be as sensitive as possible, to give maximum warning. But with a sensitivity of 0.005 percent obscuration per meter, smoke detectors will activate before fires are visible, and before serious action needs to be taken.
That means you have time to act, but you also need a sensible procedure.
"The first level of alarm is generally very high sensitivity, but not doing anything much more than just letting people know that there's a problem." says Pool. The trouble is,. your detection system will pick up levels of smoke that people cannot notice - particularly as the detector may be finding the smoke inside an air circulation system which is designed to move air out of the building.
"We do hear stories of people not reacting to an alarm," says Pool. "In one case, site security ignored the VESDA at first, because every time they went to look for a fire, they couldn't see anything. It transpired that one long power rack inside an equipment cabinet was overloading and creating smoke - but very low levels of smoke. And if it hadn't been found, it would potentially have led to a fire."
So we now have sensitive detectors, and proportionate suppression systems - but don't be complacent. Data centers are running hotter now than they used to, and Cronin tentatively suggests that we may be getting more fires and potential fires: "A decade ago, once every two years, there would be a major fire reported. Now, it's more like two a year."
Fire detection systems are the first part of a complete immune system against fire. If the system is healthy and operating smoothly, then your data center can shrug off that potential disaster.