According to a 2013 report by the Ponemon Institute called Cost of Data Center Outages, the cost of an unplanned outage is more than US$7,900 per minute. This is a 41% increase from the $5,600 it was in 2010, when its first report was issued. These findings suggest many companies don’t have the necessary practices in place to reduce or respond to outages. In the report, 71% of respondents said their company’s business model is dependent upon the data center to generate revenue and conduct e-commerce. While eliminating downtime altogether is a challenging undertaking, data center managers can start with the most frequent cause of unplanned outages – uninterruptible power supply (UPS) battery failure.
Batteries may be the most ‘low-tech’ components supporting today’s mission-critical facilities but battery-related failures account for more than one-third of all UPS system failures over the life of the equipment. This failure rate includes batteries that simply ran down ‘early’. The continuity of critical systems during a power outage typically is dependent on a data center’s power equipment, comprized of UPS and their respective battery backups. While the vast majority of outages last less than ten seconds, a single bad cell can cripple a data center’s entire backup system, particularly if adequate UPS redundancy hasn’t been implemented.
Batteries have a limited service life, dictated by the frequency of battery discharge and recharge. There are a number of factors that can adversely impact the aging process and shorten a battery’s useful life, including high or inconsistent ambient temperatures, frequent discharge cycles, overcharging, loose connections and strained battery terminals.
To safeguard backup power systems against unplanned or premature battery failures, IT managers should take steps to ensure battery maintenance best practices are observed. In addition to standards outlined by the Institute of Electrical and Electronics Engineers, manufacturer schedules for maintenance checks should be followed.
A recent analysis by Emerson Network Power Liebert Services on real-world results of Valve-Regulated Lead-Acid (VRLA) batteries in the UPS environment, along with the company’s field experience with more than 40,000 battery strings and 600,000 inspection or preventive maintenance visits, has revealed the following three battery performance realities:
Life cycles vary far too much to rely solely on manufacturers’ baseline data and life estimates
For accurate resistance baseline measurements of the thousands of VRLA batteries in the study, stationary resistance instruments were chosen. Battery voltage-type monitors are an alternative popular methodology, as are float-current and temperature-only monitors that may be installed for alarming purposes. These systems do not provide an indication of battery health in UPS, high-voltage, high-unit-count implementations.
For the report, the alarm-based systems were considered unsuitable for increasing availability, while resistance methods using an initial field baseline provided an indicator of battery state of health data – particularly when data was compared over the full lifetime of a given unit and against the initial baseline. Stationary instruments were a necessity due to the high degree of repeatability they afforded. It was important to capture very accurate unit-specific, situation-specific baseline resistance and periodic resistance readings.
What was found during analysis was that initial baseline consideration should begin 90 days after installation. When a new battery is replaced due to premature failure or cause, often the initial change in resistance will be downward and then remain constant. Figure 1 details an example of a battery change-out occurring after 600 days of life. (This battery’s initial baseline resistance will likely settle near 5250 µOhms, instead of at 5650 µOhms seen just after installation.) To capture true baseline of the replacement battery, a stationary instrument is required. This result was seen with many battery units and has caused many to doubt manufacturer-provided baselines. The study found that when a specific unit settles to its running baseline, the initial variance from the manufacturers’ baselines could be as much as 25%. It’s recommended that data center managers closely monitor batteries at 40% over the initial, true baseline. A battery is considered to be failing at 50 % over true baseline.
Batteries lose capacity in as early as three years
UPS battery manufacturers may market their batteries with a ten-year design life or life span but the reality is the actual service life of the battery will be much shorter due to external factors that cause degradation. Several issues can shorten the life of a battery string:
- Incoming power faults resulting in UPS engagement
- High or improper room temperatures
- High or low charge voltage
- Excessive charge current
- Manufacturing defects
- Overcharging and over cycling
- Loose connections
- Strained battery terminals
- Poor and improper maintenance
The IEEE states that ‘useful life’ of a UPS battery ends when it can no longer supply 80% of its rated capacity in ampere-hours. It is at this point that a battery should be replaced because the aging process accelerates (see Figure 2). Many factors can affect the useful life of a UPS battery so it is important that as soon as it is placed into service a battery should be maintained with a program that identifies system anomalies and provides information that trends end of life. Batteries beginning to fail cause an imbalance that adversely affect the life of other batteries in the string and should be removed.
Placing unmatched or new batteries into a string of aged batteries diminishes new characteristics
Current takes the path of least resistance, so placing a factory-new battery in a string of aged batteries (with varying levels of internal resistance) causes the factory-new battery to be overcharged. This could shorten the life span of the entire string. The ideal replacement scenario is to have fully-charged, ready-to-install batteries on site that match the type and condition of in-service batteries. This can be accomplished with a battery spares cabinet equipped with an on-board charger. This supply of batteries supports a fast, first-time fix and eliminates problems involved with mixing new and old batteries in a string.
Monitoring is key
According to a 2011 Emerson Network Power white paper, data centers with battery monitoring systems installed on site had a reduced rate of outages due to bad batteries. While outages did still occur, the incidents were isolated to human error where customers were either not watching the system or they did not know how to properly analyze the data provided by the monitor. This brought to light the need for experts to correctly monitor the alarm data and properly maintain the systems.
A popular option for today’s busy IT manager is the use of stationary battery monitors with remote professional analyst services (preferably having remote monitoring technology embedded into power protection infrastructures). This technology should include comprehensive data collection in order to provide early warning of alarm or out-of-tolerance conditions. With robust remote monitoring, IT and other teams, which aren’t experts in the varied technologies present in their complex data centers, can augment staff. Embedded, remote monitoring gives data center managers the ability to directly impact two key measures of availability: Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR). Improving MTTR is possible with the right remote monitoring technology because continuous connectivity allows infrastructure experts working from a knowledge base to deliver high leveld of support. They can continually collect and analyze data from key parameters and put that info into an actionable plan. This remote diagnosis allows service technicians to be armed with the correct parts needed to fix a problem before they are dispatched to a customer’s site.
With remote UPS and battery monitoring the time needed to restore a UPS is much less than with a sequential, time-based approach in which the simple awareness of an event can exceed eight hours. Having the ability to detect potential problems early and rapidly respond to defects or degradation maximizes the reliability of UPS battery systems and gives IT managers the adaptability.
This article first appeared in FOCUS issue 34. To read the full digital edition, click here. Or download a copy for the iPad from DCDFocus