Meta, the company formerly known as Facebook, will use a dedicated Microsoft Azure cluster for artificial intelligence (AI) research.
The cluster will include 5,400 Nvidia A100 GPUs and 1,350 AMD Milan Epyc 7V13 CPUs delivered using the NDm A100 v4-series instances on Azure, which went into preview yesterday.
Meta first began using Microsoft Azure Virtual Machines for AI research last year, but at a much smaller scale.
“We are excited to deepen our collaboration with Azure to advance Meta’s AI research, innovation, and open-source efforts in a way that benefits more developers around the world,” Jerome Pesenti, vice president of AI at Meta.
“With Azure’s compute power and 1.6TB/s of interconnect bandwidth per VM we are able to accelerate our ever-growing training demands to better accommodate larger and more innovative AI models.”
Microsoft claims that the interconnects between its Azure servers are four times that of rival cloud services that sell access to Nvidia GPUs, allowing for faster training of larger models.
This recent customer win for Microsoft comes after it secured a major HPC contract with the UK's Met Office, unless an ongoing lawsuit from Atos can overturn it. The rival company this week launched its own as-a-service HPC service, the Nimbix Supercomputing Suite.
Microsoft and Meta will also collaborate on the PyTorch machine learning framework for Python, an open source library primarily developed by Facebook's AI Research lab.
Meta has also partnered with Amazon Web Services for PyTorch, after the company late last year said it would run third-party collaborations in AWS and use the cloud to support acquisitions of companies that are already powered by AWS. It will also use AWS for some AI research.
This year, in another shift from doing everything in house, Meta said that it would deploy one of the world's fastest AI supercomputers, the AI Research SuperCluster (RSC), with the help of Nvidia.
The RSC is expected to feature 16,000 A100 GPUs, and be built out of Nvidia DGX systems.
But Facebook has not retreated from building out its own data centers and infrastructure. Last year, it said it planned to spend between $29 billion and $34 billion on data centers, servers, and offices in 2022.