In certain mathematical equations, there are limits and boundaries that are invisible lines surrounding a value which a function is always approaching but can never actually attain. These boundaries are called asymptotes, and their functions are called asymptotic equations.
If applied to our industry, approaching the asymptote would mean to approach 100% uptime. In the case of eliminating human error completely, we will never get to 100%, but as new tools are designed that integrate the human element with facility operations and management, it’s possible that we may actually approach 100% uptime and remove human error from the equation. We have been consistently seeking to merge infrastructure awareness and facility manageability into a harmonious, highly robust solution. Over the next decade, if we focus our new tools towards achieving this end, we can position our industry to approach the unreachable perfect standard of 100% uptime or unity.
The recent emergence of advanced, automated Data Center Infrastructure Management (DCIM) tools has allowed data center operators to aggregate, analyze and integrate the massive volumes of infrastructure data that is collected from the many disparate server, cooling and power monitoring platforms. These DCIM systems provide system-level management that includes resource capacity, allocation and utilization. DCIM tools are producing increasingly large amounts of useful data each day, making it easier to manage our very dynamic critical environments. The outcome is a significant improvement in operational efficiency, with the byproduct being predictive maintenance and proactive management. As we fine tune our focus on what specific critical environment data we mine, we must also transform this data into actionable predictions and a well-vetted execution process that will enable the enterprise to approach 100% uptime.
Collectively, DCIM tools increase situational awareness, allowing the operator to make informed strategic operational decisions. They also provide useful information that can feed into future capital and operating budgets to allow for local monitoring and global planning. DCIM also maps IT resources and the facility assets that support them. IT Service Management (ITSM) data does the same for the relationships between applications and the IT resources supporting them. When real-time operating data is analyzed and integrated with ITSM data, and presented in meaningful ways, real-time data center optimization becomes a reality.
Human factor
While DCIM tools are efficient data aggregators that allow us to manage our critical facilities more effectively, it’s the human expertise and decision-making process that ultimately keeps the core business applications running, driving profits and keeping people safe. While they are undeniable catalysts for uptime, we should not think of them as an industry end-all. For example, in the case of an equipment failure, these tools will alert the operator of the abnormal status of the system, but may not necessarily offer any insight into the root cause of the problem, or suggest the best way to correct it. It’s still up to a building or IT engineer to make sense of the data, troubleshoot and ultimately solve the problem.
While DCIM tools provide almost unlimited informational value, they are not a replacement for human experience. It’s the adaptability of human intellect, with the hands-on control of critical infrastructures, coupled with scenario-training and experience, that go into the construction, maintenance and management of modern data centers. They are all key factors that cannot be outsourced to software.
The human element has its pitfalls, proven by the fact that about 65% of all downtime can be traced back to human error. While DCIM tools can provide faster and more reliable ways to aggregate data, analyze trends, identify discrepancies and initiate actions based on previously understood anomalies, the task of managing, maintaining and operating equipment, including DCIM tools themselves, relies on personnel and critical thinking.
Figure 1: Progressive DCIM tools: moving towards being self-aware. Image by Power Management Concepts
DCIM products are typically focused on infrastructure and are disconnected from the human element. Going forward, the most effective solutions will integrate DCIM tools with the human element, thereby giving key personnel the resources needed to run the facility based on their expertise.
Imagine a self-aware DCIM solution that uses its wealth of data to enhance the capability of the operators.
The Progressive DCIM Tools in Figure 1 suggest some possibilities towards achieving that goal. This integration will reduce the impact of human error, rather than widen the gap between automated systems and operator knowledge.
Technical complexity
It is easy to forget the technical details with technology as advanced as it is today, where everything happens behind the scenes. We take for granted the ease with which systems operate when working properly. However, when the unexpected occurs, the complexity of systems become painfully obvious, making the process of returning them to an operational state difficult without up-to-date and accurate procedures, documentation and scenario-training in place.
The added value that the adoption of DCIM tools has provided when it comes to running the critical infrastructure is significant. There is no doubt they have given facility, IT managers and executives vast quantities of data that is crucial to managing their critical infrastructures. This information provided in real-time includes usage of power, space, cooling and networking resources.
Figure 2: Information gathered through monitoring and management tools. Courtesy Power Management Concepts
However, rather than providing a simple solution to the managers’ already demanding job of overseeing the center, this ‘data overload’ puts even more strain on the task.
For example, the Data Center Monitoring Points in Figure 2 illustrate the flood of infrastructure information that we have to both monitor and manage, and it is easy for the critical information to be lost in this flood of data. This begs the question: “Why are we really collecting this data?”
The data collection isn’t merely to have a large repository of facility statuses and raw information, but rather to have the data trended and analyzed to provide predictive and proactive maintenance, drive capital and operational budgets, handle critical infrastructure events and present actionable content for all high-level decision-making.
The precious data isn’t native to the tools but is intrinsic within the facility itself and is similar to tribal knowledge. The immediate value brought to operating personnel by DCIM tools is real-time access to all of it. The constant polling of all the systems is excellent in terms of mere convenience. With regard to bringing together the various equipment data, DCIM tools are incredibly helpful and absolutely necessary for even moderately sized facilities.
But at any given time, the majority of that information is not pertinent to the issues at hand, especially when we view the problem in the context of time. Is it really a requirement to continuously review every bit of information from every piece of equipment? The answer is not so much about what data we should pull and when, but rather about what we do with the data once we collect it. How can we transform all this data into useful information?
Even the visual trending of the data across time adds little value unless it can be leveraged to produce an actionable conclusion. Therefore, DCIM tools need not only be efficient in data collection, trending and simple analysis, but they must also enhance the user’s ability to make actionable decisions, thereby preventing downtime or a loss of revenue, or perhaps even a human life. They must focus themselves on being effective in providing the various decision-makers and operators of the data center’s infrastructure with the knowledge, resources and even direction required to manage their facilities effectively.
Human decision making
The ultimate goal is to create an enhanced set of tools for the mission-critical industry that bring the human element into the spotlight by presenting information tailored towards increasing the effectiveness and accuracy of the human decision-making process. DCIM tools are already excellent aggregators of data, but where they will continue to shine is in the application of that data combined with human knowledge to transform it into the information that drives intelligent action, thereby positioning the mission-critical industry on the road to achieving 100% uptime or unity. The results will be easily applied to life safety systems and other critical infrastructure that make up our modern digital society.