The Uptime Institute’s famous Tier certifications might prove less useful than a different option from the same source
The Uptime Institute’s well-known Tier certification scheme for reliability is well established, but the Uptime Institute has other certificates that may be more important in the long run.
The options include the Institute’s Management & Operations (M&O) Stamp of Approval, or the Tier Certification of Operational Sustainability (TCOS) for those on a Tier track. It is possible these may become — or may already be — more important than the Uptime Institute’s flagship Tier Certification of Constructed Facility (TCCF). Before you say it’s nonsense, let’s look at why that might be.
Outages worse than hot coals
A report on outages from the Ponemon Institute in 2013 (sponsored by Emerson Network Power) stated: “Unplanned data center outages present a difficult and costly challenge for organizations. In fact, most of the respondents in this study — from senior level to rank-and-file — say they would rather walk barefoot over hot coals than have their data center go down.”
The report contains a graph showing the root causes of unplanned outages experienced by the survey participants during a two-year period. Forty-eight percent selected accidental EPO (Emergency Power Off) and human error as the top cause. Even more telling was the comment: “Fifty-two percent believe all or most of the unplanned outages could have been prevented.”
When the researchers asked how organizations aimed to correct the root causes and prevent unplanned outages in the future, the response was somewhat unexpected. Instead of trying to reduce human error, the most prevalent response was: improve or purchase new equipment.
Using cost data from the Ponemon report, Emerson calculated that a data center outage costs slightly more than $7,900 per minute – a 41 percent increase from the $5,600 it cost in 2010. Total data center outages averaged a recovery time of 119 minutes, equating to about $901,500 in total costs.
Around the same time, the Uptime Institute introduced the TCOS designation for Tier-rated facilities (2010) and the M&O Stamp of Approval (2012) as a way to combat human error in data centers. Both of these new processes were industry-driven, with CBRE, Equinix, Fortune, Interxion, Morgan Stanley and Progressive contributing to the process.
Looking for errors
With practical knowledge gleaned from the companies, and Uptime Institute’s validation experience, the consortium developed a way to assess a facility’s operation and uncover practices that would likely introduce errors in the following functions:
- Staffing and organisation: Verify that employee job responsibilities are defined, approved by management, and focused on achieving the desired Uptime objective.
- Maintenance: Determine if preventive maintenance programs and associated procedures are in place, adequate and followed.
- Training: Scrutinize in-house and third-party vendor training programs to ensure employees and visitors are aware of site-specific policies and procedures.
- Planning, Coordination and Management: Check the adequacy of site policies, financial-management guidelines, and infrastructure libraries, including current as-built drawings of the data center, and question personnel about their understanding of the policies.
- Operating Conditions: Ensure consistent and documented management of power and cooling capacity. Specific to power, the guide says: “Load management decisions need to be established, documented, and practiced based on electrical capacity components to ensure maximum loads are not exceeded and capacity is reserved for switching between components.”
It’s hard to deny that training and having everyone pulling in the same direction will result in immediate cost savings, a safer operation and happier clients.
However, Lee Kirby, CTO at the Uptime Institute, says there is more to the story. As the TCOS and M&O Stamp of Approval matured, people found ancillary benefits, which might have as much impact as cost reductions.
Examples include reduced insurance costs, standardizing data center operations across multiple facilities, and the M&O Stamp of Approval being a definite marketing tool.
Joel Stone, vice president of global data center operations at CenturyLink, told the 2015 Uptime Symposium: “As owner-operators in the wholesale market, we want to differentiate ourselves.”
CenturyLink is serious about this, agreeing to have all 57 of its data centers obtain the M&O Stamp of Approval or TCOS if the data center already has TCCF.
Training and having everyone pulling in the same direction will result in immediate cost savings, a safer operation and happier clients. It could also bring reduced insurance costs and standardization
AIG doesn’t want to miss out, says Herb Alvarez, director of global engineering and critical facilities: “We are making it [M&O] a requirement. What CenturyLink is doing is influencing us.”
For companies such as AIG, which owns data centers but does not run them, the Stamp of Approval has another benefit. Both AIG and Morgan Stanley, which contract out their sites to technical facilities management (TFM) companies such as Inviron and CBRE, use Uptime Institute’s M&O in contracts with their partners, and extend this to the colocation providers they lease from to ensure global consistency.
The M&O Stamp of Approval is also useful to enhance the SLAs and guarantees at legacy and smaller data center operations that cannot afford to physically meet Tier-rating specifications.
Human failures are repeated
“Data centers are bespoke, complex homo-technical systems,” said Ed Ansett, chairman of i3 Solutions at DCD’s SE Asia Converged event in Singapore in September. “The human side and the technical side cannot be separated. An understanding of both is required to reveal what went wrong. Too often the failure is one that has been seen before.”
Secrecy is a problem, and Uptime’s bid to break this down is the Uptime Institute Network, a coalition offering peer-to-peer interaction and a confidential forum for knowledge transfer. It is also a way to source best practices and validate them before including them in company products and certifications, Kirby says.
Many say that ratings such as the M&O will become more important than Tier-rating certifications, and the reason can be summed up as follows: There are two commercial data centers in town, and their infrastructures are indistinguishable. However, only one of them has the M&O Stamp of Approval. Which one would you choose?
- Uptime Tier Certification of Constructed Facility: The well-established Tier I to IV Certification based on the design and implementation of your physical data center.
- Management and Operations Stamp of Approval (M&O): Human error is the leading cause of data center failures, so the M&O Stamp of Approval examines your data center’s operating policies and practices.
- Tier Certification of Operational Sustainability (TCOS): An examination of operational practises within the Tier Certification process.