To help the industry learn from incidents like the recent BA IT crash, which grounded thousands of planes over a weekend, an industry group is being set up to share data about data center failures.
If one of BA’s aircraft had fallen from the sky a neutral inquiry would have been set up immediately to find what went wrong. Data centers can be delivering critical services but their failures or potential disasters are normally covered up, or subjected to an internal inquiry by the company involved and eventually covered by non-disclosure agreements, with a result that the same failures will be repeated over and over again.
The Data Center Incident Reporting Network aims to change that, by setting up a neutral and anonymous forum to share data about what caused serious IT failures or near-failures. It will launch on August 3 at the UK Data Center Interest Group’s London meeting.
Learning from failures
”The important thing to understand is so many of the failures are recurring failures,” the Network’s founder, Ed Ansett of i3 Solutions, told DCD. ”I came to the conclusion some time ago that people were not learning from experience.”
Ansett cataloged many IT failures which were repeated, he told DCD: ”There are a lot of failures, like the failure of the Singapore Stock Exchange two years ago, which we have seen before many times. We need to show the root cause, and how to avoid it.”
The idea of sharing these causes first surfaced at a presentation Ansett made to DCD’s Singapore event in 2015, and has developed from there. The Network will be a charitable trust: ”It’s not to make money, it’s for ordinary data center folk to learn.”
Although normally reluctant to share details of their failures, he believes companies will be willing to share “the principles, not the gory details” of crashes, to educate people: ”My feeling is there are a whole lot of people who are bursting to share - particularly things from a couple of years ago.”
The group will start small, but at some point it will need sponsorship to fund a secretariat which will scrutinize submitted information. Any funding will have to be neutral, not from single equipment vendors, for instance, said Ansett. The data center industry does not have a body equivalent to the Civil Aviation Authority, which funds crash investigations.
At present the group has an advisory board, and will start out concentrating on the power and cooling area, but plans to branch into higher levels including networking, servers and storage and applications.
It’s only a matter of time before a data center failure will be associated with human fatalities,” said Simon Allen, of the UK DCIG, in a LinkedIn post. ”We need to act now - there’s no reason why this archaic secrecy should prevail.”
Pointing to the airline industry’s record of sharing accident information, he said: “The same is not the case in the Data Centre industry where it is common practice to cover up failures or potential disasters in a misguided attempt to protect reputations.
“Root cause investigation findings are normally secret and bound by NDA which has resulted in The Data Centre industry being at a disadvantage in learning from failures.”
The group’s third trustee is mission critical facilities expert Peter Gross.