Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.


DCD SE Asia: Many data center failures are due to secrecy

If a data center kills someone, regulation is inevitable

Data center failures are happening too often, according to a presentation at DCD SE Asia in Singapore, and the reason is a lack of information sharing.

Recent data center failures have included the Singapore Stock Exchange (SGX, which had a major outage in 2014 due to a combination of factors), but this event was an exception, according to Ed Ansett of i3 Solutions, one of the experts called in to consult the organization after it happened. Where SGX shared its experience, other failures were shrouded in secrecy because of the competitive nature of the industry. 

Non-disclosure agreements prevent learning

ed ansett

Ed Ansett, chairman, i3 Solutions

”Data centers are bespoke complex homo-technical systems. The human side and the technical side cannot be separated. An understanding of both is required to reveal what went wrong - and all too often the failure is one that has been seen before,” Ansett told the conference.

Other industries such as aviation have the same problem, but achieve much higher levels of reliability because of regulations requiring accident investigation, said Ansett, ”We are a younger industry, and we are unregulated. There is no authority that looks at data centers. The closest we come are safety regulations, or mandates laid down by financial services authorites such as The Monetary Authority of Singapore (MAS).” One reason for this is that data center failures do not cause loss of life, unlike plane crashes.

A number of failures could be avoided if, for instance, data center operators refused to use residual current detectors (RCDs) in racked servers, said Ansett. These are designed to protect people from electrocution by devices such as lawnmowers, so they are inappropriate in a data center, and they can also trip unpredictably at a level lower than their 30mA recommended setting, causing a server failure that can cascade into something worse.

Because failure data is not shared, knowledge like this is not available to all, and data centers achieve far lower continuous uptime than they could, said Ansett.

He expects that this might change in future. Data centers are becoming more and more important to the increasingly instrumented world, and in his view, eventually a data center failure will happen that kills people. At this point, the industry will have to accept regulation, which will enforce sharing of the data.

Readers' comments (2)

  • Ed
    Great article. Failures happen no matter the degree of engineering and facility integrity.

    Exception is a wonderful phrase equals surprise. I expect the visibility of potential failures is becoming obscured by business risk. It would interesting to revisit the failure sites and understand the business appetite for allowing failure scenario testing as a regular regime at component and system level. I highly expect that in practice organisations prevent such test regimes through a misconception of risk management associated with operational change management.

    Unsuitable or offensive? Report this comment

  • Good article on Data Centre failures, especially due to their increase in number and loss of business.

    The author highlighted important factors contributing to these incidents. First, the industry is not regulated and second the industry does not share infomation citing NDA which is irrelevant when it comes to power failure.

    No one can diagnose the Data Centre failures with inadequate information. If the NDA restricts Data Centres from sharing information then they shoukd fix their own failures and no one can help them.

    More importantly, the DC operators/owners do not have necessary skills or expertise to prevent these incidents in future without sharing information.

    The risk of business is there for both information sharing and also with failure and the industry currently have no idea which is more dangerous as they are relatively new and acting immature in unregulated business.

    Perhaps they will not share failure information until they become helpless and suffer huge financial loss and then it might be too late.

    Currently, Singapore claims preffered destination of Data Cente industrean in ASEAN and this wil take a beating if there us another DC outage for whatever reasons!

    This is the cost one has to pay for not sharing information and also due to unregulated industry.

    Unsuitable or offensive? Report this comment

Have your say

Please view our terms and conditions before submitting your comment.



More link