AWS announces machine learning training chip Trainium

Coming to the cloud next year, along with Intel Habana

Amazon Web Services plans to launch a new chip for training machine learning models.

Trainium was developed in-house, and joins the company's existing Inferentia processor that focuses on inference workloads.

Each cloud an island

Chip specifics were not revealed, but Amazon claimed Trainium will offer the most teraflops of any machine learning instance in the cloud.

The company says it will have a 30 percent higher throughput and 45 percent lower cost-per-inference compared with the standard AWS GPU instances, but by the time it releases in the second half of 2021 new GPUs may be available and prices may have changed.

No benchmarks were revealed, so it's not possible to compare the hardware to Google's own in-house TPU chips, soon set to be in its fourth generation.

Amazon also plans to roll out another machine learning processor to its cloud service, the Intel Habana Gaudi.

Intel acquired Habana Labs back in 2019 for some $2bn, immediately replacing the company's chips with its own Nervana line - itself part of a prior acquisition.

Now, Habana's Gaudi processors are nearly ready for prime time, available in early 2021 as EC2 instances. An 8-card Gaudi EC2 instance can process about 12,000 images-per-second training the ResNet-50 model on TensorFlow, Intel claims. The company also boasts a 40 percent better price-performance than current GPU-based EC2 instances for machine learning workloads.

“Our portfolio reflects the fact that artificial intelligence is not a one-size-fits-all computing challenge,” said Remi El-Ouazzane, chief strategy officer of Intel’s Data Platforms Group.

“Cloud providers today are broadly using the built-in AI performance of our Intel Xeon processors to tackle AI inference workloads. With Habana, we can now also help them reduce the cost of training AI models at scale, providing a compelling, competitive alternative in this high-growth market opportunity.”

AWS announces machine learning training chip Trainium

Each cloud an island

Further reading

Smartening up: How AI and machine learning can help data centers

Microsoft previews FPGA-based machine learning service for Azure

Amazon Web Services launches Mac instances

Tags

The make vs. buy decision for data center infrastructure management software – A clear choice

2023 Data Center Market Trends: Hong Kong Asia's Connectivity Hub

Emerging Energy Storage Technologies

Success story: Kao Data and Cadence