Working on live power systems can kill. Is 100 percent uptime worth it?
Chris Crosby is passionate about this industry. His company Compass Data Centers will write a check for $100,000 to any client that doesn’t get a fully operational data center within six months. So when Crosby tells us that preventing arc-flash accidents is a moral imperative, we should listen. Arc-flash, also known as flashover, can cause injury and death.
Source: US MSHA
What is an arc-flash event?
Summer lightning is a dramatic arc-flash event, but Crosby is talking about discharge in the data center.
“Arc-flash events occur when electrical systems do not do what you want them to,” he explains. “A lightning bolt, basically, comes out. It comes out with heat approaching 35,000 F and often creates a pressure blast of up to 2,100 pounds per square inch. That’s enough to kill someone without getting electrocuted.”
An arc-flash will release energy rapidly, due to unexpected arcing between two phase busbars, or a phase busbar and a neutral or ground. Mike Holt, NEC expert and noted author, says it is a self-sustaining process like that used in electric-arc welding.
“The fault has to be started by something creating the path of conduction or a failure such as a breakdown in insulation,” says Holt. But it continues after the physical fault is removed.
“The cause of the short normally burns away during the initial flash and the arc fault is then sustained by the establishment of a highly-conductive plasma. The plasma will conduct as much energy as is available and is only limited by the impedance of the arc. This massive energy discharge burns the bus bars, vaporizing the copper, and thus causing an explosive volumetric increase; the arc blast, conservatively estimated, has an expansion of 40,000 to 1.”
Shouldn’t the circuit breaker trip?
One would think that circuit protection would prevent this from happening. I asked Paul Estilow, principal engineer at DLB Associates and a thirty-year veteran in the electrical power industry, why circuit breakers do not always prevent arc-flashes. “The problem is that the fault current during an arc-flash event may be less than the rating of the circuit breaker,” Estilow told us in a phone conversation. “Because there is high resistance in the arc, the current level remains relatively low, while the amount of energy builds, leading to an explosion.”
Estilow added there are circuit-protection devices designed specifically to mitigate arc-flash, but they are expensive and due to the vagaries of arc-flash the specialized devices may not always work as expected.
How serious is this?
Arc-flash events or more correctly accidents are not new, they have been around as long as man-made electricity. What may surprise most people is the prevalence of arc-flash events and the resultant injuries. Richard B. Campbell and David A. Dini, authors of this Fire Protection Research Foundation (associated with the NFPA) report write, “A common estimate of arc flash occurrence is that there are 5 to 10 arc flash explosions in electrical equipment every day in the US, but the origins of this estimate are unclear.”
Campbell and Dini found the estimate seems to be borne out by official documentation. “Literature on electrical injury has tended to focus on shock and electrocution, while devoting comparatively little attention to injuries resulting from arc flash or arc blast,” conclude the authors. “Research on electrical burns nevertheless shows that burns from electric flash are responsible for many of the work-related burns treated at burn centers.”
- A Michigan burn center found that 34 percent of patients injured on the job received flash injuries.
- Arc flash injuries represented 55 percent of the electrical work-related burn injuries in the Ontario research.
- A study of electrical injuries over a 20-year period at a Texas burn center found that 40 percent of burns were electrical arc injuries
Why is this happening?
As to why this is happening, some experts say it is the nature of the beast: accidents will happen. Estilow provided a “for instance.” A technician could innocently switch a circuit breaker on and trigger an arc-flash event; simply because a nest built by a field mouse in the electrical box controlled by circuit breaker shorted out.
Accidentally initiating an arc-flash event is possible anywhere electricity is used. That is until scientists and engineers figure out a preventative solution. However, Crosby’s concern is more focused. Being in the data-center business, he is alarmed by what he considers to be an inordinate amount of arc-flash events in data centers.
A well-published example would be Siobhan Gorman’s 2013 Wall Street Journal article Meltdowns Hobble NSA Data Center. Gorman says electrical problems caused ten meltdowns in a 13 month period, primarily due to arc-fault failures in the back-up generators. She adds that an official described one occurrence as a flash of lightning inside a two-foot box that led to a fiery explosion.
Crosby feels that an excessive focus on availability adds to the problem. When data-center operators guarantee 100 percent uptime, staff end up working on energized equipment, in order to to keep the customer-facing equipment up and running.
According to Crosby and Estilow there are only three reasons to work on energized equipment — a major reason arc-flash events happen in data centers:
- Interrupting the electricity would endanger human life
- Shutting down power would make the life-threatening situation worse
- Certain electrical tests that can only be performed wih the equipment energized
Crosby contends that there are many instances where work is being done on energized equipment in data centers for reasons other than those stated above. Crosby also contends that those reasons should not exist. If a data center is offering 100 percent uptime, the redundancy required to meet that guarantee needs be in place, and that same redundancy should allow any work on electrical systems to be accomplished with the systems in a de-energized state.
The Uptime Institute has provisions for that in the organization’s tier-rating system according to this Green Server Room paper that defines site infrastructure with regards to the tier-rating system:
- Tier III Concurrently Maintainable Site Infrastructure: A concurrently maintainable data center has redundant capacity components and multiple independent distribution paths serving the computer equipment. Typically, only one distribution path serves the computer equipment at any time.
- Tier IV Fault Tolerant Site Infrastructure: A Fault Tolerant data center has multiple, independent, physically isolated systems that each have redundant capacity components and multiple, independent, diverse, active distribution paths simultaneously serving the computer equipment.
DLB Associates’ Estilow explained the difference. Think of Tier III as have two independent power systems A and B. If system A is down for repair or maintenance, system B is powering the data center. However, if system A is down for maintenance, and then an outage occurs with system B, a Tier III data center could go down.
Tier IV provides two redundant power systems and is fault tolerant, meaning that upon failure of any individual equipment the system will switch to the redundant system without affecting the IT load.
In any case, both Tier III and Tier IV allow technicians to work on electrical equipment when it is powered off.
This is a complicated issue, with data center management having to weigh risks versus consequences. To that end, Crosby shared a personal experience:
“Spreading the word about the danger of arc flash is important to me personally because I have seen how devastating it is when there is a fatality on a project. At a company I worked at previously, a worker tragically was killed during construction of a data center when a pressurized safety system exploded.
“I don’t want anyone in our industry to be exposed to unnecessary risks, and the risk of arc flash is dramatically minimized when the right procedures are in place. Data center employees deserve to be safe and their families deserve to know that safety is a top priority in our industry.
“Arc flash is deadly, and companies in our industry should be doing much more to create safer working conditions that minimize the chance of it happening. It’s the law, but it’s also our responsibility as employers who have the lives of our workers in our hands.”