According to the Uptime Institute, most incidences of data center downtime are attributable to holes in management processes used by its operators. The Institute’s existing four-Tier rating system for infrastructure redundancy is commonly used – especially in PR and marketing literature – to describe a facility’s reliability. The practice – which many argue is a misuse – has attracted criticism of the Tier system itself and the Institute responded by developing a new standard, aiming to augment the Tier classification and create a more “holistic” way to forecast uptime.
The new standard will grade the extent to which management procedures and policies, building characteristics and site characteristics ensure a data center’s reliability. The three levels of the Tier Standard: Operational Sustainability – Gold, Silver and Bronze – are meant to be used in tandem with the facility’s Tier rating to define the likelihood of downtime.
A ‘HOLE’ TO BE FILLED
While enjoying widespread use within the data center industry, the Tier system is limited to rating the level of redundancy of the electrical infrastructure. Critics have pointed out that a data center’s uptime depends on more and the Institute has not denied this assertion.
“We fully acknowledge that the Tier classification system addresses select issues in the data center,” Uptime Institute VP Julian Kudritzki said. “It is just the configuration of the site infrastructure equipment. It has never claimed to be more than that. (The OS standard) will address the holistic element of data center operation.”

Julian Kudritzki, VP of Uptime Institute
The new standard will address risk factors that affect data center performance beyond those addressed by the Tier system. A combined rating would look like “Tier III Gold,” or “Tier IV Silver,” etc.
“That’s been a hole for quite a while for the Uptime Institute so it’s good to see that they’re filling that hole,” said Dan McNary, VP of commercial construction at gkkworks. McNary came to gkkworks from Syska Hennessy Group in May, when the group sold its construction arm Syska Hennessy Group Construction to the Irvine, Calif.-based architecture and construction firm.
Matt Stamper, VP of business for the San Diego, Calif., data center provider Castle Access, agreed: “In the absence of understanding the operational characteristics of a data center, or the building itself, or the location … you only look at part of a data center.” Stamper is a certified information-systems auditor.

Matt Stamper, VP of business at Castle Access and a certified information-systems auditor
“There are different domains (within the data center industry) that intersect. IT infrastructure is impacted by facility management and operations, but unfortunately, in many cases, the two worlds speak different languages.”
KEY ELEMENTS OF OPERATIONAL SUSTAINABILITY
Three main considerations will go into determining a facility’s OS level: management processes, building characteristics and site location. Management processes will carry the biggest weight by far, because management was reported to have been at fault in about 70 percent of all instances of irregular performance or downtime, according to a data base of “abnormal incident” reports maintained by the Institute. Kudritzki says that over a data center’s lifetime, the OS level will be even more important in determining its uptime than the Tier level is.
The Institute’s database includes reports on about 4,500 abnormal incidents – 450 of them full downtime events – that took place in more than 130 data centers. The reports were collected over the past 13 years from members of the Site Uptime Network – a network of companies that share information with the goal of resolving issues that affect infrastructure availability.
McNary said the 70 percent estimation was correct. “We found the same thing,” he said, referring to research conducted by Syska Hennessy which also found that human error was at fault in about 70 percent of incidents of performance irregularities or full downtime that took place over the past 10 years in data centers whose operators have shared information with the company. The collective sample size is between five and 10 million square feet of raised floor.
The management portion of the OS assessment will look at everything from operations management to staff qualifications and staffing plans to vendor management. The building characteristics examined will include things like age of the structure, amount of floors, presence or absence of other tenants, type of entrances and other design elements that are beyond the target Tier. The final element, location, will mostly evaluate the likelihood of a man-made or a natural disaster.
Each of the three overall elements will be broken down into categories and categories will be further broken down into behaviors. The standard’s developers chose behaviors because – unlike with the Tier rating – it was impossible to predict a “pass/fail situation,” Kudritzki explained.
“Each operator is going to have a proprietary process,” he said. “We are going to look at how effective it is. We’re looking at effectiveness of the program, not telling you how to do it.”
The Institute plans to publish the new standard on its Web site on July 1. It will be accessible by the public at no cost.
The Uptime Institute was founded in 1993 to create and facilitate end-user knowledge communities that focus on improving reliability of data center facilities. It also has a commercial professional-services business, which provides data center engineering and management consulting. In October of 2009, the Institute was acquired by the IT market analyst firm 451 Group, which also owns Tier1 Research.
Related news: Conflicting procedures delay failover after power outage at Google data center
Related news: Three downtime incidents in one week at Amazon data center
Related video: Codero explains (Mar. 15, 2010) data center outage
Related feature: Shades of tiers
Keywords: Uptime Institute, Tier standard, Operational Sustainability, data center management, data center site selection, downtime, data center outage |