After close to a week amongst the flashing lights and constant background noise of fruit machines and chips clinking together that make up Las Vegas, there is finally time to reflect on the AWS re:Invent 2024 conference.
Unsurprisingly, the overarching theme of re:Invent was artificial intelligence (AI) - a constant across the board for tech conferences in the last couple of years.
Despite this, there was a sense of something a bit different. Something new was in the air, or perhaps just a shift of sentiment. It feels like AI has moved beyond being a bubble of excitement, and into a practical and tangible reality.
What is interesting is how Amazon Web Services is dealing with that.
Re:Invent saw plenty of technology releases that had the keynote audience "ooh-ing" and "aah-ing" - for example, the launch of the new Nova foundation models, or the expansion of Amazon Bedrock Marketplace to more than 100 AI models, or the new Amazon Q offerings, or... The list goes on.
But what gets us at DCD excited, and presumably also you, the reader, is the hardware and infrastructure that enables all of this.
Earlier this year, DCD delved into the question: Can AWS hit its sustainability targets and win the AI race? It became abundantly clear at re:Invent that this was a question not just on the minds of the wider population and the DCD Editorial team, but also those of AWS executives.
The cloud company revealed its Power Usage Effectiveness (PUE) numbers for the first time during the conference, boasting an average of 1.15 across its global data center portfolio, with its best-performing site registering a PUE of 1.04.
For comparison, Google claims its facilities average a PUE of around 1.1, with its best site offering 1.06. Meta's facilities, on average, offer a PUE of around 1.08. Microsoft's newest facilities achieve a PUE of around 1.12, with a global facility average of 1.18 across the portfolio. Oracle has said its data centers offer a PUE "as low as" 1.15.
It's unclear why AWS took so long to share this data, and the company was unwilling to share further details such as what the "worst" scoring data center PUE-wise was, but likely part of it was to demonstrate the impact of its newly launched data center components.
According to AWS, these data center components - also debuted during re:Invent - will lower the average PUE to 1.08, while also better suiting facilities to handling AI workloads.
The components include, but are not limited to, a "novel" liquid cooling solution that enables seamless integration of direct-to-chip cooling and air cooling, something AWS claims will reduce the number of electrical conversions in the data center by 20 percent, and "engineering innovations" that will enable the cloud giant to support a 6x increase in rack power density over the next two years, and a further 3x in the future.
AWS has also used data and generative AI to find the most efficient way to position racks in its data center, thus reducing stranded power and increasing compute power by 12 percent per site.
"It's an interesting data point," Margaret O'Toole, worldwide tech leader of sustainability at AWS told DCD. "When we look at a decarbonization journey, we're looking to decouple business growth from environmental impact, so stats like this are progress in that direction. Being able to fit more customer workloads per data center means we need data centers at a slower rate."
Earlier this year, Amazon as a whole boasted that it had been powering its operations with 100 percent renewable energy as of 2023 including its data centers. According to O'Toole, the rise in AI and power needs associated with this has led to Amazon including nuclear in its overall energy portfolio.
While Amazon reports meeting this 100 percent renewable energy goal, pressure group Amazon Employees for Climate Justice (AECJ) published research arguing that just 22 percent of the company's data centers were actually powered by renewables.
“Amazon is distorting the truth with this announcement that hides the fact that its energy-hungry data centers operate in the heart of coal country, and the company’s expansion is driving up demand for more oil and gas,” a statement from AECJ said.
"There are two ways you can calculate this, the location-based method, and the market-based method," said O'Toole when asked about this. "There are no grid systems in the world that are 100 percent renewable for the location-based method. In order to do that, every single energy source in the system would need to be renewable or carbon-free, but grid systems are a shared resource and renewable energy is intermittent.
"The idea of a market-based method is that you know that no grid system is going to be carbon-free overnight and it is a shared responsibility, so let's create mechanisms to drive investment."
This is typical of PPAs - companies purchasing renewable energy credits or funding the development of renewable projects to offset the energy use of their operations. PPAs are in themselves criticized for one of the very reasons that O'Toole pointed out - renewable energy fluctuates, and doesn't necessarily cover the actual energy used. Because of this, many are opting for hourly matching PPAs instead to make up for those fluctuations. O'Toole confirmed that AWS is not currently doing this but is exploring ways to further improve their PPAs - for example by investing in renewable energy projects in 'dirty' grids where it will benefit the grid more than, for example, the Nordic countries, where clean power sources already provide the bulk of energy.
Other efforts to improve the sustainability of AWS' operations include having extended its server lifetime from five to six years, and the use of Amazon's re:Cycle Reverse Logistics hubs to repair, recirculate, or sell equipment for reuse.
Beyond the efforts to make AWS data centers more sustainable and increase efficiency to handle AI, the re:Invent conference saw some major updates to its EC2 offerings specifically dedicated to AI workloads.
Three new instances were introduced by CEO Matt Garman: the P6 instances with Nvidia's Blackwell GPUs, and Trn2 instances with Amazon's self-developed Trainium2 chip.
The Trn2 instances, now generally available, offer 30-40 percent better performance than current GPUs. The instances have 16 trainium2 chips connected by a high bandwidth, low latency interconnected NeuronLink, and can deliver 20.8 petaflops of compute.
The cloud giant also announced it was launching Amazon EC2 Trn2 UltraServers - comprising four Trn2 instances and connected with a NeuronLink, the UltraServers have 64 Trainium2 chips and can offer up to 83.2 FP8 petaflops of compute power.
It is these "UltraServers" that are particularly titillating. Rahul Kulkarni, director of project management, compute and AI/ML, at AWS, told DCD: "The UltraServer basically means I get for of these [Trn2 instances[ in a fully coherent domain. This is a landmark announcement because over the past year, we've seen model sizes continue to scale - large language models, foundation models - scaling to billions and trillions of parameters... there are some models where customers don't want to break up their training jobs across multiple servers, and that's where UltraServers become helpful.
"Because you have four of these instances fully coherent and can run on a single node, your time to train dramatically reduces the response time and you can even get a better performance on some of the cutting-edge foundation models."
Notably, AWS is also working on "Project Rainer" with Anthropic which will use these UltraServers to build a cluster with "hundreds of thousands of Trainium2 chips interconnected."
CEO Matt Garman said during his keynote: “The cluster is going to be five times the number of exaflops as the current cluster that Anthropic used to train their leading set of quad models for healthcare, five times the amount of compute that they use in the current generation. I am super excited to see what the Anthropic team comes up with that size.”
Also on the chip side was the soft launch of plans for the Trainium3 chip which will be the company's first chip made using a three nanometer-process node. It will give offer 2x more compute than Trainium2, and will be 40 percent more efficient.
During the same keynote, it was also revealed that AWS counted Apple among its customers, with Apple already using the Inferentia and Graviton chips, and exploring the use of Trainium2 for pretraining its models, expecting to see up to 50 percent improvement in efficiency.
The overarching message from AWS re:Invent was clear: AI is King, and, as the public cloud's dominant player, AWS thinks it does it better than anyone else.
One thing is for certain, the cloud giant says it is working hard to adapt to the power-hungry beast of AI in a way that enables it to scale sustainably, though it is far from perfect in this respect just yet.
Oh, and off the back of the popularity of Graviton, it really would like companies to use its AI chips.