This winter, everyone will be monitoring their energy consumption, as prices soar. Data centers will wonder if they can still turn a profit when electricity prices go up, and at least two data center firms have closed, in the UK, and Ireland blaming the energy crisis.
But what about the bigger question? What impact does the entire data center sector have on world energy consumption? It’s an important question for policymakers - but it seems the answers we have are not reliable.
Most data center professionals will shrug, and say facilities use “about two percent of the world’s electricity.” They’ll say the figure came from a newspaper article, an analyst firm, or from their own marketing department’s slide deck. They may also say they have heard data center energy use is plateauing, thanks to virtualization and the cloud.
Others will give a higher figure, saying that data centers use seven percent of electricity or more in some counties, and are on track to use 51 percent of the world’s electricity by 2030 Ask them where the figure came from, and they will quote a different set of newspaper pieces, analysts and marketing literature.
That’s not a good basis for discussion, says entrepreneur and academic David Mytton, who has tracked these estimates to their sources, to see where the discrepancies come from.
It’s important work, because bad data can lead to poor decisions.
Bad data, bad actions
“This large variance… serves to confuse the general public who want to help tackle environmental issues,” says Mytton, and colleague Masao Ashtine, in a paper, Sources of data center energy estimates: A comprehensive review for the scientific journal Joule Review. It can lead to misguided efforts to save energy, for instance by deleting old social media photos.
More importantly, the unreliable figures mean data center operators don’t treat the issue with the right seriousness, energy grids can’t plan for their demands, it’s impossible to get a true picture of data centers’ role in global warming, and we see a series of heated but inconclusive arguments.
“Unexpected demand places stress on electricity transmission and local distribution capacity, which has a long lead time for upgrades and can have knock-on effects on other users of the electricity grid,” says Mytton.
Three boroughs In West London have approved new data centers which are using electricity equivalent to tens of thousands of homes. As a result, new housing developments can’t get connections.
In Ireland, power demands increased 144 percent in five years (from 1.2TWh in 2015 to 3.0 TWh in 2020). Data centers are projected to use 27 percent of all Irish electricity demand by 2029, and several operators have delayed or canceled data center projects in the country.
Amsterdam paused data center construction in 2020 because of concern over the sector’s energy and land demands.
Meanwhile, 15 percent of Danish electricity will be used by data centers by 2030 according to projects.
In all these cases, better predictions might have avoided headline-grabbing pauses or cancellations.
When electricity is put in the context of total energy use, things get complex. The majority of electricity is still generated from fossil fuels, so data centers’ electricity use contributes maybe two or three percent of global CO2 emissions (a figure based on guesswork).
Even if data centers can opt for green energy, it may not help. Countries have a limited amount of electricity from renewable sources - if data centers use that, there’s less available for heating and transport, which are sectors that urgently need to decarbonize.
Bad data makes this a minefield for any policy makers, leaving them at the mercy of special pleading.
The data center industry can rightly claim that digitization can be a help in decarbonization (the so-called "Carbon Handprint"). For instance, if Zoom meetings replace business travel - but others will say that no sector should get a free pass for unlimited increases in energy use.
Without accurate data, lawmakers cannot balance these two arguments.
“The lack of accurate information about data center energy consumption and how that will grow is already having an impact,” warn Mytton and Ashtine.
For their analysis, Mytton and Ashtine gathered up the reports on data center energy use published in the last 16 years, a period dating from the launch of Amazon Web Services which kicked off the era of the cloud in 2006.
They want to help the industry produce better figures on energy use: “We do not aim to criticize individual publications or suggest that a particular estimate is more accurate than another. Our goal is the broad analysis of common methodological problems within this research field so that future readers can have more confidence in the reliability of estimates.”
Any report is only as good as its data: “We focus on source provenance and data inputs because they are the foundational component that determines scientific reliability.”
Mytton and Ashtine track citations back from one paper to previous ones, using “Sankey diagrams.” (Fig 2) Many of them refer back to the first major data center energy report, commissioned by the United States Congress and produced in 2007, by researchers at Lawrence Berkeley National Labs (LBNL) including Jon Koomey, Arman Shehabi, and others. There is also a 2008 paper by Koomey, Worldwide electricity used in data centers.
The pair checked any publication in English which attempted to calculate the energy consumption of data centers, either globally or within a region such as the US or Europe.
They came up with a list of 46 publications. This might seem surprisingly a surprisingly small output from 16 years of research on such a hot topic, but there’s a reason for this. Many publications simply quote or refer to others.
Following these links back, Mytton and Ashtine found 676 individual “data provenance traces” or original sources of data. Many of these are no longer available, either because links are broken (so called “link rot”), or there is no trace of the original document. Some were commercial information only seen by the original researcher, and some give no methodology for how they were arrived at.
The problem with missing data is that information may have been available on publication, but is not now: “Web links are not permanent and the web pages used as references are no longer available (a particular problem when Cisco is cited). The problem is compounded by how there are few available sources of market data, which are generally only available from private/commercial reports or databases.”
The reports are often based on secondary data. For instance, the amount of energy used by servers in a given year is often estimated by taking the number of servers that have been shipped, using that to estimate the number which were in use in that year, and from that, the likely amount of energy those servers used.
Some sources got quoted more than others. For instance, despite the difficulty of seeing their data firsthand, analyst firm IDC and Cisco were quoted in 43 percent and 30 percent of the publications, respectively. The actual dependence on these figures is higher, because some papers cite earlier ones that rely on Cisco or IDC data without explicitly referencing it.
And there are questions over reliability, as only a third of sources were from peer-reviewed publications. A further 38 percent were simply from “Reports” which could mean industry publications or self-published articles. Some data points made it into published papers despite having no year of publication.
The papers handled data in different ways - some citing carbon emissions, others direct energy use. To make comparison possible, all were converted to the same units - teraWatt hours (TWh) per year.
Estimation and extrapolation
It’s worth emphasizing that all these quoted energy use figures are estimates. There is no single energy authority, classifying and adding all the energy consumers in the world, or even within individual regions.
The reports use different methods to bring together their sources and calculate an estimate of data center energy use, then use other data points and assumptions to extrapolate that to give a likely figure for future energy use.
They also take such different approaches, that comparing them can be a nightmare, says Mytton: “Koomey excludes storage and networking components, Somavat et al take the US total from Brown et al, then double it on the assumption that the United States represents half the global total, Andrae and Edler exclude internal data center networks, instead counting them as part of global networking as a whole, and Masanet et al exclude Bitcoin whereas Montevecchi et al include it.”
To take all this and produce figures, there are essentially three approaches: bottom-up, top-down, and extrapolation
Bottom-up modeling will combine figures such as the specified power draw of servers, combine that with estimates of the installed base and multiply that by the average power usage effectiveness (PUE) of data centers to get a figure for how much energy is used in the facility.
That’s fine, but published figures may not give the whole picture. For instance, some reports base energy usage on SPECpower benchmark data, but this can be skewed. A report from Van Heddeghem and colleagues found that the SPECpower database is biased toward more energy-efficient servers, while less efficient volume servers are the largest group by power consumption.
The problem gets worse when the model has to project energy use forwards based on today’s figures and observed trends. This is tricky, because equipment can change, becoming more efficient, or else more power-hungry systems start to come through because of a shift to more demanding tasks.
“The further out, the wider the range of estimates due to the difficulty of accounting for energy efficiency improvements and changing trends in equipment,” says Mytton.
The 2007 LBNL paper provides a sterling example of the dangers of extrapolation. The study found that data center energy use in the US had grown by 90 percent between 2000 and 2005, and warned that this would be unsustainable in the long term.
Then a follow-up report from Jon Koomey in 2011 noted that growth in the US had actually slowed. After that, a 2016 LBNL report saw energy use actually plateauing (Fig 3).
The reason for this was that cloud applications had grown rapidly, but were delivered more efficiently than the same services provided within the in-house data centers which the cloud was beginning to replace.
But what’s happened since 2016? It’s possible that the efficiencies provided by the cloud may be reaching their limits, or that the hyperscale data centers providing them may be driving a large expansion in consumer services.
In 2007, the LBNL report discounted hyperscale data centers, regarding them as insignificant. That report’s 2016 sequel led by Arman Shehabi reckoned that hyperscale capacity would make up more than 40 percent of the entire 2020 server installed base.
It seems that there are many more hyperscale data centers in the US (about 400 of the world's fleet of 700). “This US focus has been suggested as a reason why data center energy consumption continues to rise in regions outside of the United States, because the United States has benefited from the improved efficiencies of these facilities,” says Mytton.
Cryptocurrency is another big unknown that energy estimates have trouble digesting. It emerged entirely during the period since 2006, as the initial Bitcoin whitepaper was published in 2008, and now energy use by cryptocurrencies is hotly debated, and reckoned to be as large as a small country.
In July this year, Digiconomist estimated 132.05 TWh per year of Bitcoin energy consumption (roughly the electricity consumption of Sweden). Other estimates put the figure as low as 80 TWh (the electricity consumption of Belgium). While there are other blockchain applications, it’s reckoned that Bitcoin makes up two thirds of the total cryptocurrency energy demand.
It’s worth pointing out that this unknown figure is not just “noise.” These figures are hugely significant compared to the estimates of the world’s total data center energy. They are more than half the size of the lower scenarios in most of the papers.
Data center energy researchers may have trouble getting real data out of the cryptocurrency market, but they ignore it at their peril.
Even new technologies in data centers are difficult to predict. Liquid cooling might replace the energy-hungry air-conditioning units used in data centers today, but Mytton warns: “there is a general expectation that direct liquid cooling of data center equipment will become more widely deployed within the next seven years, but few operators currently have high-density racks that would justify it.”
Top-down modeling might be more reliable on current statistics, because it is based on “actual data” in the form of regional totals provided by government statistics. However, these studies are very rare, because of the difficulty of getting hold of that kind of data: Mytton and Ashtine only found one top-down study, by Jens Malmodin, which is highly regarded, but only covered Sweden.
There are small signs this may be changing. In January 2022 the Irish Central Statistics Office released data center electricity consumption figures based on actual meter readings collected by the Electricity Supply Board (ESB Networks). It’s going to be updated annually, so future policy in one of the most controversial data center markets could be based on good data.
However, top-down models don’t have any magic to make them better at predicting future trends.
Extrapolation models take a baseline from one of the other models, and then assume there’s a correlation between demand and consumption to apply a growth factor.
“Most extrapolation calculations are based on energy intensity per unit of data transmitted, with assumptions about energy efficiency improvements for future projections,” says Mytton.
This can produce differences. For instance, when Anders Andrae and Peter Corcoran took Koomey’s bottom-up estimates, they applied a bigger growth rate, because new consumer cloud services will boost growth and increase energy demand, even if the services are more efficient.
Papers by Andrae tend to calculate the average energy used per CPU instruction, and then extrapolate the number of CPU instructions the world will use in a given year.
How much energy?
Given this diversity of data, it’s not surprising that Mytton and Ashtine don’t present a solid figure for data center energy use. In any case, that wasn’t their aim.
In total, the paper finds 258 estimates of data center energy consumption, including 179 for the whole world. 24 for the USA and 19 for Europe. Those “Europe” figures cover another level of diversity, as they cover a variety of groupings depending on whether EU, EEA, or other countries are included. There are also single country estimates for Germany, Sweden, and China.
Taking the global estimates, there’s good agreement for how much energy data centers used back in 2010, but estimates diverge in 2020 and, by 2030, they are miles apart (Fig 1).
“The further into the future, the wider the ranges,” says the paper. “This is to be expected given that past estimates can be calculated from actual data, whereas future estimates must make assumptions about key parameters such as energy efficiency and server shipments.”
Mytton and Ashtine report an order of magnitude difference between the smallest and largest predictions for total data center energy use in 2030 - from 146TWh to 1929 TWh per year.
The actual figure is even higher, because they excluded five outlying estimates that predicted data center energy would leap to as much as 8,253 TWh per year.
Most of these variations are due to the impossibility of predicting technology changes. “On one hand, proof-of-work blockchain mining requires a significant amount of energy, but on the other hand, many IT workloads have moved from inefficient enterprise data centers to more efficient hyperscale cloud systems,” says Mytton. “The smartphone has become an important computing device with more energy-efficient processors compared with desktop computers, but questions remain about the power profile of new 5G cellular networks.”
The big problem with predictions is that extrapolation will increase and expand existing weaknesses in the data. “This snowballed bias is a problem where publications rely on earlier estimates without critically assessing their assumptions and sources,” warns Mytton.
For instance, he’s quite critical of work by Anders Andrae, an analyst employed by Huawei, who published three projection papers in 2019, which are based on an assumption that energy use would be correlated with network traffic. This was adopted by French think tank, The Shift Project, in its paper Lean ICT - Towards Digital Sobriety.
“Despite the unavailability of most of the sources supporting the estimates published by The Shift Project, this report has been cited by a large number of mainstream media outlets,” say Mytton and Ashtine.
That assumption, that there was a direct link between network traffic and energy consumption, was recycled unexamined in research since 2013 but was refuted by Jens Malmodin and Dag Lunden in at least two papers.
In particular, there’s direct evidence from the last couple of years. There was a shift to home working during the pandemic which increased network traffic, and an increase in energy use wasn’t reported.
It’s worth mentioning that although the Shift Project report gets regularly quoted, it stands out in Mytton’s Sankey diagrams, for quite the wrong reason. All but two of its major sources are no longer available.
What we need now
We need to have better data to base future strategies on - and this means that the private companies who run the cloud and data centers need to be more transparent,
Mytton acknowledges that Google and Microsoft have led the way, with both publishing top-level statistics about their energy consumption, renewable energy purchases and PUE figures.
“Other major data center owners are not as transparent,” says Mytton. “Amazon only reports a single number for carbon emissions that aggregates all their operations and so makes it difficult to break out data centers from e-commerce logistics.”
All three of the largest hyperscale cloud providers give their customers a calculator to show the carbon footprint of their cloud workloads.
“This transparency is important because migrating IT workloads to the cloud outsources the operational emissions of running that infrastructure to the cloud provider,” says Mytton. It’s also good marketing of course, as the cloud resources will usually be less energy-hungry than the equivalent resources running in-house.
Colocation providers and data center operators like Digital Realty and Equinix also provide some figures, but research from Uptime Institute suggests that data center owners are far more likely to report their energy efficiency (which has an impact on costs) than carbon emissions and environmental footprints.
Actual energy figures for cryptocurrency are vital for governments who want to make energy available for regular use. “Simple bans on cryptocurrency mining activities have been shown to cause displacement to more carbon-intensive regions,” warns Mytton.
Fundamentally, data centers can be built (and are being built) quicker than power capacity. This leads to the (literal) power struggles in Amsterdam, London, Ireland, and elsewhere.
Better predictions could lead to better planning, and maybe avoid negative consequences.
“The solution to cap demand introduced by Amsterdam is able to provide certainty so that the grid operator can deliver appropriate infrastructure upgrades, but it also places constraints on the ability for IT providers to grow their services within the region,” says Mytton.
“When demand outpaces supply, prices will inevitably rise, potentially having an impact on the ability of people in lower income brackets to benefit from access to digital services.”