Data center failures are happening too often, according to a presentation at DCD SE Asia in Singapore, and the reason is a lack of information sharing.
Recent data center failures have included the Singapore Stock Exchange (SGX, which had a major outage in 2014 due to a combination of factors), but this event was an exception, according to Ed Ansett of i3 Solutions, one of the experts called in to consult the organization after it happened. Where SGX shared its experience, other failures were shrouded in secrecy because of the competitive nature of the industry.
Non-disclosure agreements prevent learning
”Data centers are bespoke complex homo-technical systems. The human side and the technical side cannot be separated. An understanding of both is required to reveal what went wrong - and all too often the failure is one that has been seen before,” Ansett told the conference.
Other industries such as aviation have the same problem, but achieve much higher levels of reliability because of regulations requiring accident investigation, said Ansett, ”We are a younger industry, and we are unregulated. There is no authority that looks at data centers. The closest we come are safety regulations, or mandates laid down by financial services authorites such as The Monetary Authority of Singapore (MAS).” One reason for this is that data center failures do not cause loss of life, unlike plane crashes.
A number of failures could be avoided if, for instance, data center operators refused to use residual current detectors (RCDs) in racked servers, said Ansett. These are designed to protect people from electrocution by devices such as lawnmowers, so they are inappropriate in a data center, and they can also trip unpredictably at a level lower than their 30mA recommended setting, causing a server failure that can cascade into something worse.
Because failure data is not shared, knowledge like this is not available to all, and data centers achieve far lower continuous uptime than they could, said Ansett.
He expects that this might change in future. Data centers are becoming more and more important to the increasingly instrumented world, and in his view, eventually a data center failure will happen that kills people. At this point, the industry will have to accept regulation, which will enforce sharing of the data.