Data center power outages have caused some of the most catastrophic company blackouts in recent memory. From the Delta data center outage that cost the airline $150 million, to the blanket of darkness that fell over Super Bowl XLVII halting a sixth title for Niners Nation – power outages can affect anyone at any time.
The difficulty for organizations, however, is identifying the root cause of a power outage, because they can come from any number of sources. Was a server overloaded with power – did it fry the system? Did the local power supplier have an off day? Why was the intern allowed into that room?!
Here are the five most important questions organizations should ask to make sure none fall foul of a data center power outage.
Can I move with my rapidly evolving power system?
Power systems are constantly evolving and adapting to the changing demands inside the data center.
Each new server or switch brought in can have a significant impact on the power need. It is therefore vital to be able to analyze data center performance over a long period of time so trends and patterns can be pinned for easier, long-term forecasting.
This allows you to plan for change and fluctuations, balance load, predict future capacity needs, plan workflow, and schedule service.
Is my power chain under threat?
Traditionally, security has always been the focus of the IT department. But, it has increasingly become more of a concern for facility and infrastructure managers – from threats visible and invisible.
A growing number of data centers are connecting to a network besides what’s contained in racks with terminals and points of access everywhere. These avenues can become routes for sabotage to any would be cyber criminal looking to wreak havoc.
Moreover, a cyber breach doesn’t even have to go through wires and cables. A savvy criminal could trick his way into a physical space and sabotage the power from the inside. But, it’s not just nefarious individuals attempting to cause harm data center managers have to be wary of. Someone with little knowledge could interact with interfaces and cause immeasurable damage.
To prevent this, documentation and control is critical. More hardware is not the answer to preventing catastrophic power outages. In fact, adding additional hardware actually makes the control situation worse.
What is my disaster recovery plan and is it documented?
The ability to perform power failure simulations - by virtually switching devices off - without affecting the production environment - allows you to plan for the worst and implement recovery services.
There’s always one data center operator who assumes their power chain and back-up systems are foolproof, without a failsafe test. And how do you think that typically works out?
Power failure simulation enables you to locate where redundancy is lacking and uncover single points of failure. But with this comes the need for documentation.
Build and document your recovery plan in advance of a catastrophic power failure.
Can I monitor my operations in real time?
You must know at any given time what energy is being used, where and by which devices. This can often be difficult with data centres that are constantly adding to their infrastructure as this can have a huge impact on the how much power is needed and where it is distributed.
The only effective way to keep an eye on all the moving parts is to have a single pane-of-glass view. This holistic view brings real-time monitoring and alarming that enables you to mitigate risks and make changes to avoid disaster.
Do I know everything about all inter-connected devices and systems?
It is absolutely vital that your power chain is documented all the way through - from when the power enters the building, through the UPSs, PDUs, and all rack-mounted equipment. This means you need to know what is connected to what, as well as the devices’ respective interdependencies. This knowledge allows you to understand the potential impact should a certain piece of equipment fail or be taken offline for maintenance. Additionally, you should also know the status of each power chain device.
A proven solution to power management can be realized with a data center infrastructure management (DCIM) solution. DCIM enables you to run your data center at peak efficiency, while allowing all involved to improve overall operations and identifying vulnerabilities to keep the power chain safe.
A deployed DCIM solution can also give you a holistic view of your estate eliminating the communication silos between IT and facilities by sharing real-time data and easy to understand charts and graphs.
With so many causes of data center power outages it can be difficult to keep a data center’s lights on.
Trying to keep up with all the changes in one’s infrastructure using manual methods and spreadsheets is laborious and induces unforeseen risks.
However, with an automated solution that tracks, monitors and alarms, a data center never has to be in the dark again.
Mark Gaydos is chief marketing officer for Nlyte Software