Certain events should not happen. They cannot be blamed on the weather, unscheduled maintenance or even a “power surge.” Instead, they come down to poor planning. Major businesses, from airlines to Internet giants, have fallen victim to this - feeling the effects of preventable data center outages. Unfortunately, this happens more often than anyone in the industry is comfortable admitting.

An Eaton-commissioned survey of IT and data center managers across Europe found that 27 percent of respondents had suffered a prolonged outage leading to a disruptive level of downtime in the previous three months. The vast majority of respondents (82 percent) agree that most critical business processes are dependent on IT and 74 percent say the health of the data center directly affects the quality of IT services. Essentially, business depends on IT and IT depends on the data center to function. The fact that more than one in four data centers recently suffered a prolonged outage tells us that something is wrong at an industry level.

Prior planning prevents poor power performance

Server crash failure error outage
– Thinkstock

Just as critical business processes depend on IT, the data center itself must provide resilience to keep the business running. It is a core asset in any business’ risk management strategy.

Employee error, a backup generator failing to kick in or a panicked decision - these can all be prevented by proper processes and power system design. Yet businesses often fail to follow the golden rule of data center power management: actions have consequences and consequences require action.

Organizations need a disaster recovery process in place that clearly defines which steps should be taken when re-energizing the data center. In a full outage situation where people are in a state of panic and under pressure to resume normal services, staggering the re-energization of systems in the data center may seem counter intuitive. After all, the goal is to get back online as quickly as possible. However, this process helps to avoid extending the outage.

UPSkilling staff

In reality, a lack of power awareness and understanding is a common problem. Two-thirds of the data center professionals who took part in the research were not fully confident in power. Until organizations get to grips with power management – from UPS maintenance to battery inspection – we can expect to see more power-related outages.

However, there is a profound concern around skills availability. Many organizations find it hard to acquire and retain relevant expertise or talent, whether it is designing for energy efficiency, managing consumption on an ongoing basis, or dealing with power-related failures quickly and effectively to avoid and mitigate outages.

Building a stronger tomorrow

Alongside skills and power processes, the facilities infrastructure itself often needs upgrading to meet today’s efficiency, reliability and flexibility expectations. Around half of Eaton’s survey respondents report that their core IT infrastructure needs strengthening - and this increases to almost two-thirds when it comes to facilities such as power and cooling.

Power management is increasingly becoming a software-defined activity. Given the skills gap, software can play an important role in bridging the divide between IT and power by presenting power management options in dashboard styles that are familiar to an IT audience, making it easier to understand and even automate power infrastructure management. This technology can eliminate extended outages, such as the British Airways May Bank holiday outage this year, because the automated processes bring systems back online in a controlled and monitored fashion.

We’ve moved towards more virtualized environments in data centers. IT and data center professionals are very familiar with using virtualization to maintain hardware. So why not use the same principles in power? All power distribution designs, and associated resiliency software tools, must be compatible with major virtualization vendors to future-proof the infrastructure. This approach will enable data center professionals to constantly maintain systems, thereby mitigating the risks associated with out-of-date infrastructure.

Better preparation and disaster recovery processes could have prevented many of the outages which have hit the headlines this year. The data center industry must learn lessons from these high-profile outages and take them on board. Effective power management is a ‘must have’, not a ‘nice to have’.

Dennis O’Sullivan is EMEA data center segment manager at Eaton