An unexpected release of inert fire suppression gas during routine maintenance at one of Microsoft’s European data centers set off a series of unfortunate events, causing a seven-hour outage.
On its Azure report page, Microsoft explained that after the gas was released, it caused the Air Handler Units to automatically cease operations, which in turn led to the ambient temperature rising - which then caused some systems to automatically shutdown.
The outage meant that some North European Azure customers had issues connecting to, or managing, their cloud resources between 13:27 and 20:15 UTC on 29 September.
Letting the gas out
“During a routine periodic fire suppression system maintenance, an unexpected release of inert fire suppression agent occurred. When suppression was triggered, it initiated the automatic shutdown of Air Handler Units (AHU) as designed for containment and safety. While conditions in the data center were being reaffirmed and AHUs were being restarted, the ambient temperature in isolated areas of the impacted suppression zone rose above normal operational parameters,” the company reported.
“Some systems in the impacted zone performed auto shutdowns or reboots triggered by internal thermal health monitoring to prevent overheating of those systems. The triggering of inert fire suppression was immediately known, and in the following 35 minutes, all AHUs were recovered and ambient temperatures had returned to normal operational levels.”
Microsoft continued: “Due to the nature of the above event and variance in thermal conditions in isolated areas of the impacted suppression zone, some servers and storage resources did not shutdown in a controlled manner. As a result, additional time was required to troubleshoot and recover the impacted resources.”
The company apologized to those affected and said it was taking steps to ensure similar incidents did not happen again, including undertaking suppression system maintenance analysis to find out why the gas was released in the first place.
This is not the first time fire suppression systems have caused issues in a data center: last year, the noise and vibration of gas being released damaged hard drives at an ING facility [pro tip: baffle your nozzles], bringing it offline for 10 hours.
A similar outage appears to have happened in Glasgow, with a “powerful blast” of gas damaging IT systems.
DCD has contacted Microsoft for further details and will update this story accordingly.