It's 15 months since the morning of March 10, 21, when a fire broke out in OVHcloud's SBG2 data center in Strasbourg and destroyed it utterly.
We still don't know the exact cause of the fire, but we now have reports from the Bas-Rhin firefighters who attended the blaze and, this week, from the French industrial accident investigators, BEA-RI. Both will make uncomfortable reading for OVHcloud. The firefighters said the building had wooden floors, and lacked an extinguishing system and a full emergency power cut-out, while the accident investigators revealed that water had been detected in the power room where the blaze began, minutes before the fire broke out - a moment the report shows in a pair of searing video images.
The reports have come out, while a law firm, Ziegler & Associes, has gathered complaints from more than 140 aggrieved OVHcloud customers in a class action.
Silence from the cloud
OVHcloud was initially very communicative about the fire, with OVHcloud founder Octave Klaba describing the company's heroic efforts to replace servers and get customers back online. Klaba even promised to set up a research lab to help data centers avoid such disaster. Then, around a year ago it went abruptly quiet on the issue, saying that it could not speak until official reports had been issued.
At times since then, OVHcloud sources have said how frustrated they are to be unable to speak - and we've been prepared to believe that. But with the publication of these reports, we now think it is time the company gave its version of the story.
The story with the class action claim is similar. Ziegler gathered details of OVHcloud customers' complaints, and said last year that the French cloud provider had failed those customers by running a data center which was at risk of such a disaster, by giving customers a false impression that their data was backed up, and by giving them inadequate compensation for the loss of business they had suffered as a result of the fire.
OVHcloud has consistently said that it won't comment or respond to those claims until it has had formal letters from the lawyers. We know now that the first batches of letters are with OVHcloud, so it is time for the company to respond to those.
A measured analysis
The accident report is measured and careful not to jump to conclusions. It is one of the first major incident reports from the Bureau d’enquêtes et d’analyses sur les Risques Industriels (BEA-RI) an organization newly created in April 2021, and modeled on the BEA that reports on aviation accidents.
The Bas Rhin firefighters concentrated on issues that hindered their operation - the difficulty in cutting off the power, the complication of getting the power properly cut off, and the way wooden floors and air circulation made the whole building go up like a bonfire.
For its part, BEA-RI gives credit where it is due: OVHcloud had safe lead-acid batteries which didn't contribute to the fire, and the building's fire detection system operated perfectly, allowing all the staff to be evacuated safely in minutes. However, that fire detection system also revealed that smoke from the fire was circulating through the building within minutes of the fire starting on the ground floor.
BEA-RI looks more closely at the moment the fire began, simultaneously in two power rooms. It draws no final conclusions, but notes some facts which must be explained further.
Firstly, the equipment which failed had needed repeated maintenance for unexplained faults.
Secondly, in the hour before the fire began, sensors on one of the inverters showed anomalous moisture readings.
"The presence of liquid or moisture in an electrical device can cause the formation of a short circuit internal likely to cause the observed damage," says BEA-RI, but the investigators say it's beyond their brief to determine a final cause. "These elements alone do not... determine the cause of the failure," they say the investigators, noting that they were not able to "establish whether it was a measurement error or a humidity peak linked for example to a presence of liquid."
BEA-RI says: "It is not possible, at this stage, to establish the cause of the failure at the level of the UPS which could be explained by different hypotheses (presence of liquid or humidity linked to the presence of the nearby, malfunction linked to the maintenance operation carried out on same morning, operation of the inverter outside normal operating ranges, etc.)"
We need a full answer
The investigators are clear that they aren't data center experts. The report includes lengthy explanations of fundamental data center design issues like Uptime Institute Tiers, and fire prevention systems.
However, there are plenty of other people who are data center experts. It's time they had full access to the details of what happened, and we were able to get to the bottom of this.
As others have commented after other incidents, any attempt to delay or obscure the cause is supremely counter-productive, as it could contribute to the same thing happening elsewhere. A data center accident and emergency reports organization has been proposed before by groups including DCIRN. We're not sure if Klaba's promise of a fire prevention laboratory was ever serious, or merely a public relations exercise, but it could actually be useful.
If nothing else, a final disclosure of the cause of the fire will be necessary for any settlement of the class action, whether it goes to court or not.