When a record winter storm hit Texas earlier this year, chaos reigned. A poorly maintained grid was unable to cope with freezing temperatures, which led to rolling blackouts and closed roads, bringing the state to its knees for an extended period in February.
As residents huddled at home and industries closed, data centers mostly were able to persevere despite the sudden storm. How they managed to keep online proved to be a lesson in the value of preparation - and in luck.
"We were watching what local weather forecasts were saying about a storm coming in, and that there would be some cold temperatures," Digital Realty's Dallas/Fort Worth technical operations manager Benny Furtick remembers.
"Our guys have been through severe weather before, so they know what that looks like, whether it be a hurricane or ice and such, but I think what surprised us was how widespread it was."
This feature appeared on the cover of the April 2021 issue of DCD Magazine. Subscribe for free for more.
The big one
When the storm hit on a cold Sunday evening in February, it blanketed the entire state and crept into neighboring ones.
"As it started to hit, we understood pretty quickly how big this was," he said. "Our teams rose to the occasion, and got through it with hardly any issues."
Other businesses were less fortunate, with around 80 percent of the state's chemical production brought offline, along with chip manufacturers, and other factories.
Data centers, however, are built with a preternatural distrust of the grid, as even brief moments without power can bring things crashing to a halt. Each provider DCD spoke to said they had at least 48 hours of diesel fuel on site.
"And we wanted to make sure that we kept the fuel coming," Furtick said. "Our agreement with [our supplier] is that they will have a truck at whichever site we need within a 24 hour period. And then after that they just keep the trucks coming."
This worked for Digital, which experienced grid power outages in bursts of 10-14 hours at some sites in Houston, and 6-12 hours at others in Dallas. "And then we have properties in Dallas and Austin that never even saw a flicker."
It was not a stress-free experience, however. With power out across Texas, diesel suppliers could not pump their fuel into trucks. Instead, deliveries had to come in from out of state.
"[But the] interstates were shut down on the south part of Texas coming in from Louisiana to Houston," Furtick explained. "We didn't get to a point where we're running out of fuel, but the 24 hour period was getting pretty close to time's up by the time the trucks arrived."
Equally, when the company realized it had more than enough supply in Houston, it switched deliveries to Dallas. "That night, another coating of ice came in, and the interstate between Houston and Dallas became treacherous." The trucks drove at around five miles an hour, slowly crawling to their destination.
While the trucks ultimately arrived, it's not hard to imagine a slightly worse storm that might shut down more roads and make supply harder - if not impossible. "There's a version of this scenario where that did happen," Akamai's Americas VP of network infrastructure Todd Lawrence said.
"And that's where you start to get worried about when your next delivery is."
Netrality's COO Josh Maes agreed: "These sorts of winter storms have more risk than I think we all originally anticipated. We didn't need to bring fuel in, but anecdotally I think we learned that fuel could be an issue."
The company was fortunate to have a large on-site supply, which "is a really big value add that we didn't appreciate the full extent of," Maes said. Its 1301 Fannin facility has 65,000 gallons of fuel on-site which can support the building for 7-10 days.
Counting down to the fuel resupply
It's not clear how much supply every provider had, and how quickly they were running out. "Nobody wants to say 'we're this close,'" FiberTown's VP and business unit manager Anthony Froelich said.
"Someone with a deployment at Equinix in Dallas was telling us they were down to the last eight hours of fuel."
"We had heard that Equinix was getting down to about 12 to 16 hours of fuel, and not providing good information," Akamai's Lawrence said. "I think there was a miscommunication there, but we started to get very nervous and started to figure out how to de-risk by moving applications, moving traffic around."
The facility stayed online, like most of the industry, but it highlighted a crucial flaw in our fragile network ecosystem. “The stark reality is that on the Internet, particularly in the US, there's a few facilities that have a lion's share of the interconnectivity,” Lawrence said.
“And so if we have a critical failure, it won't matter where my servers are. If all the interconnectivity is in a few hands and there's a disaster, then the Internet is fundamentally screwed. A truck bomb in [the right] nine facilities in nine cities going off simultaneously would cripple the US economy.”
Akamai is working on taking control of its own backbone, and its own interconnection, but it - like everyone else - is reliant on the interconnected nature of the Internet. Unlike data center operators, it is also reliant on the companies it uses as a host.
As the years have progressed, and the number of 'once in a 100 years' disasters has increased to a steady drumbeat, Lawrence said that the company's focus on solid fuel preparations has steadily crept up.
"We have eliminated people due to poor responses or lack of infrastructure related to not only how they store the fuel, but how old their equipment is, how often they maintain it, and how strong their fuel supply contracts are."
Froelich concurred, saying that some customers care more now about data center resiliency plans, and want details on the age of equipment and how they hope to keep them running.
His company, which operates a data center in Bryan/College Station and leases a Digital Realty data center in Houston, was also fortunate with the grid.
"In Bryan, we were never asked to go off of utility power and run because we had 911 operations and other critical functions happening out of the site," he said.
"Their call center was down so we have an emergency one for them below the data center."
Another customer was an energy supplier, which relied on the facility to help get its natural gas to producers.
"That's one thing that I've tried to bring back to our network operations teams - the emergency operations team that came and were sleeping overnight in the little bunk below the data center, and in their call center, coordinating rescues of people that were trapped in homes or elderly that had no heat before they froze to death.
"I want the team to realize that if our data center wasn't up and running, none of [the emergency workers] would have been able to do what they needed to do, which was saving lives."
The Texas Advanced Computing Center also wanted to help. Classed as critical infrastructure, its power was prioritized, but decided that it was more ethical to reduce the load to help reduce strain on the grid.
"We never actually had our circuits turned off, but we do draw an enormous amount of power," TACC's executive director Dan Stanzione explained. "And we knew hundreds of thousands of residents didn't have power at home, so we started shedding load on a Monday morning, and then eventually turning stuff off as it went idle over 24 hours, and we stayed that way until Friday morning."
The supercomputer usually consumes about 6MW (9MW at peak), but was brought down to less than 1MW. A supercomputer center with a different view on uptime than commercial operators, TACC does not have diesel generators.
Commercial facilities mostly did not go off the grid without being forced, with the exception of Evoque, which confirmed it voluntarily ran on a generator for 15 hours at its Allen data center. "Our clients saw no interruption in uptime during that time," Drew Leonard, VP/Strategy at Evoque, said.
While it sought to shut down as much as possible, TACC could not risk fully switching off. "We do chilled water storage, about a million gallons," Stanzione said.
That's hard to freeze, but some of it is in the pipes that span the data center. "As long as we can just keep it circulating," it will be okay. The site has three chillers and numerous redundant pumps. "We just left one pump running a little bit and kept the water moving through the pipes," he said.
"Had we actually turned off the chilling plant completely…” he trailed off. “Our facility didn't suffer burst pipes but another University of Texas at Austin building across the street from us did.”
Each operator found that the data centers provided much-needed shelter to staff and customers. “From a preparedness standpoint, we had folks who weren't planning to be at the data center, but they got stuck there,” FiberTown’s Froelich said.
“So we didn't want them leaving, because you're taking your life in your own hands at that point. In future we would prepare for more people, to ensure greater comfort.”
Digital Realty was asked by the city of Lewisville if they had spare space that was warm. "We happened to have an office-type facility that a customer moved out of, and we made it available," Furtick said, although ultimately the city never used the space.
TACC also saw an influx of people looking for warmth, Stanzione said. “We had more people than normal in the building because it had power and water and a lot of people's houses did not. We have a couple of showers on site, and I think somebody came in to do dishes at one point.”
Others weren't so fortunate. Nearly 70 percent of those served by the state's main power grid, ERCOT, went without power at some point during the subfreezing temperatures of Storm Uri, while almost half had a water outage, a University of Houston study found. Outages lasted on average 42 hours.
For those sheltering at home, such outages were sometimes fatal, with at least 194 people thought to have died.
“There's critical infrastructure in the state that is not built to withstand the extremes of weather that, frankly, should be expected,” Stanzione said. “This wasn't a natural disaster, this was a few days that were cold,” he added, pointing to cities like Chicago that easily handle similar events. “This was self-inflicted.”
Studies, lawsuits, and eulogies will unpick exactly what went wrong that week in February, but at a high level an unregulated grid failed to enact basic winterization or redundancy measures - despite warnings - leaving it open to collapse.
“Utilities act in response to the incentives that are there,” Stanzione said. “If there are no incentives to invest in common good infrastructure, then we're going to have issues.
“This isn't a mother nature story, this is a lack of preparation story."