Machine learning (ML) is growing rapidly, but as techniques improve, its carbon footprint will level off and even shrink, according to new research from Google.
Energy use can be reduced by choosing efficient ML models, and running them preferably in the cloud, on optimized processors, according to a paper due to be published in IEEE Compute, from Google researchers. Together, these measures can cut ML's energy use 100-fold, and its carbon footprint 1,000-fold, the report claims.
That's still a lot of energy: the paper admits that even with these reductions, ML is a big proportion - 10 to 15 percent - of Google’s total energy use.
Thinking hard
"We identified four best practices that reduce energy and carbon emissions significantly, all of which are being used at Google today and are available to anyone using Google Cloud services," says Brian Patterson of Google Research's Brain Team, in a blog post about the paper, The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink.
The paper looks at the energy cost of running ML hardware, including data center overheads, used in training of natural language processing (NLP) models. It lists a set of benefits that can be gained by using the right hardware and consumption models, as well as benefits from Google's particular way of distributing cloud workloads.
If users choose "sparse" ML models, this can reduce computing by three to ten times, the report says. And using specialized hardware optimized for ML will increase energy efficiency by a further two to five times. Patterson's group also gives themselves credit for nearly doubling efficiency by using resources in the cloud instead of in on-premises data centers.
Finally, the group claims to reduce carbon footprint up to tenfold by picking the location with the cleanest energy, an option allowed within Google's cloud.
Some questions arise from this - the benefits of specialized hardware and cloud operations are well known, so the group appears to be comparing its ML approach to a rather poorly-optimized scenario. Choosing to place workloads in green locations also sounds good, but could "hog" the green resources, and leave other workloads running on fossil electricity.
Patterson responds to this potential criticism: "While one might worry that map optimization could lead to the greenest locations quickly reaching maximum capacity, user demand for efficient data centers will result in continued advancement in green data center design and deployment." In any case, Google promises that by 2030 all its data centers will operate on 100 percent locally generated carbon-free energy, 24 hours a day.
In part, the paper is a shoutout to Google's own work claiming it has improved ML hardware and software by a factor of 57 times. This is partly from better ML models, with Google's recently published Primer ML model cutting computation needs four-fold, without losing accuracy compared with Google's previous Transformer model, introduced in 2017. Meanwhile, Google's recently announced TPUv4, ML chip is claimed to be fourteen times more efficient than the Nvidia P100, which is not that big a surprise given the P100 is around five years old.
Essentially, the paper says that a modern ML implementation running up-to-date models on Google hardware in a Google data center will be much better than an older model running on Nvidia hardware in an on-premises data center in 2017.
Overall, Google's use of ML has increased, but the paper says the increased efficiency has "largely compensated for this increased load," keeping ML training and inference between 10 and 15 percent of Google's energy use.
Patterson's blog also discusses the energy cost of running Neural Architecture Search (NAS), an automated process that searches for and adopts better ML models, so the efficiency will evolve. "As the optimized model found by NAS is often more efficient, the one-time cost of NAS is typically more than offset by emission reductions from subsequent use," says Patterson.
Research has looked at the energy use of NAS, but some studies have misinterpreted the data, he said: "Unfortunately, some subsequent papers misinterpreted the NAS estimate as the training cost for the model it discovered," he says, warning this creates an error "nearly as large as if one overestimated the CO2e to manufacture a car by 100x and then used that number as the CO2e for driving a car."
Overall, Patterson says that the measures used by Google could be easily used elsewhere, producing, "a virtuous circle that will bend the curve so that the global carbon footprint of ML training is actually shrinking, not increasing."
Patterson thanks a large team of Google researchers. Other authors on the paper include Jeff Dean, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, and Maud Texier.