Amazon Web Services (AWS) has launched a series of new data center components to make its data centers better equipped to handle the next generation of artificial intelligence (AI) workloads.

Announced at AWS re:Invent this week, the innovations cover power, cooling, and hardware design and aim to improve the energy efficiency of AWS' facilities.

The new capabilities will eventually be implemented globally across AWS' new data centers, with some components already in existing facilities.

AWS logo
– Getty Images

"AWS continues to relentlessly innovate its infrastructure to build the most performant, resilient, secure, and sustainable cloud for customers worldwide,” said Prasad Kalyanaraman, vice president of infrastructure services at AWS. “These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads. But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”

AWS Simplifies electrical and mechanical design

AWS has simplified its electrical and mechanical designs to make its data centers easier to maintain and increase reliability.

According to the company, these updates give an infrastructure availability of 99.9999 percent, while reducing the potential number of racks impacted by electrical issues by 89 percent.

One element of this is to reduce the number of electrical conversions in the data center by 20 percent with a more simplified energy distribution design.

AWS is also bringing backup power closer to its racks, and reducing the number of fans used to exhaust hot air, instead using natural pressure differential which will improve the amount of electricity available for servers.

Liquid cooling, rack design, and control systems

With new AI servers currently requiring as much as 850W per chip, and expected to soon reach 1kW each, liquid cooling is now a necessity. AWS has developed a "novel mechanical cooling solution" using direct-to-chip cooling in its new and existing data centers.

The cloud giant notes that some technologies do not need liquid cooling, so it has made its liquid-to-chip cooling system able to "seamlessly integrate" air and liquid cooling for power chipsets like AWS Tranium2 and Nvidia GB200 NVL72.

The cooling solution was developed in collaboration with "leading chip manufacturers."

Kevin Miller, AWS' vice president of Global Data Centers told DCD at AWS Re:Invent 2024 in Las Vegas that flexibility was a key consideration with the new cooling solution.

"If you look at the semiconductor market, there's a very rapid development of what chips are going to need: air cooling versus liquid cooling.

"One of the key things we needed was the flexibility to deploy it in different configurations in different data centers, depending on exactly what chips are landing at what point in time and when we complete transitions from a prior generation of chip to the next generation of chip," said Miller.

He added that, while liquid cooling has been around for a long time, "there's very little in the way of a developed supply chain around doing that at scale," motivating AWS to deal with the problem themself.

AWS has also used data and generative AI to work out the most efficient way to position racks in its data center and has been able to reduce the amount of stranded power to provide 12 percent more compute power per site.

This new positioning of racks will apply to the new AI hardware and a "wide range of other hardware types."

Miller noted of the configuration that it does see a variety of racks and hardware in the halls, adding that by not simply putting the same hardware together, it improves their redundancy and resiliency. "The fact that we spread infrastructure out, actually across all of our data centers, means that - and we are very proud of our availability track record - when we have availability events, one of the first things we try to do is really minimize the scope, the blast rates, of how many things can be impacted."

Miller added: "Now that said, because of that transition from everything being air cooled to liquid cooled, we do need the flexibility to be able to add liquidate to support those racks. And that will probably lead to us having certain data halls have a little bit more bias towards liquid cooling racks than we do today. But ultimately, we still want to create that flexibility."

On the power side, AWS has developed "engineering innovations" that will enable it to support a 6x increase in rack power density over the next two years, and a further 3x increase in the future, partially delivered by a "power shelf" which delivers power throughout the rack and reduces conversion losses.

The cloud company has also rolled out an Amazon-owned control system across its mechanical and electrical devices which will help standardize monitoring, alarming, and operational sequences.

AWS reduces mechanical energy consumption by 46 percent and embodied carbon in concrete by 35 percent

AWS has also made efforts to improve sustainability across its data centers.

The cloud company said that its new cooling system will reduce mechanical energy consumption by as much as 46 percent during peak cooling conditions without increasing water usage on a per-MW basis. AWS cites a new single-sided cooling system, reduction in cooling equipment, and introduction of liquid cooling capabilities as factors behind this.

The carbon in concrete used in building data centers has been reduced by up to 35 percent compared to the industry average, and the company is reducing its use of steel overall. Miller added that the steel AWS is using is coming from electric arc furnaces instead of traditional gas fired furnaces, which is further helping to reduce AWS' embodied carbon in steel.

Finally, Amazon's backup generators will be run on renewable diesel to help reduce greenhouse gas emissions, a transition already commenced in Europe and the US.

“As Anthropic develops our leading foundation models, having access to secure, performant, and energy-efficient infrastructure is crucial to our success,” said James Bradbury, distinguished engineer, Compute, at Anthropic. “AWS’s commitment to building cutting-edge data centers is one of the key reasons we’ve chosen them as our primary cloud provider and training partner. Their design improvements represent a significant step forward in providing secure, scalable, and efficient infrastructure to power AI models and drive innovation in this field."

The components are built to scale across AWS infrastructure worldwide including its 34 regions, 108 availability zones, and other offerings including AWS Local Zones.

New data centers with the full set of components will begin in early 2025 in the US, with some facilities already using the new offerings.