GPU maker Nvidia and cloud giant Microsoft have entered into a multi-year collaboration to build "one of the most powerful AI supercomputers in the world."
The cloud-based system will use Nvidia GPUs and networking gear, as well as use Nvidia's AI software stack.
Specifics were not disclosed, but Nvidia said that the deal will add tens of thousands of Nvidia A100 and H100 GPUs, as well as Quantum-2 400Gb/s InfiniBand networking gear.
"As part of the collaboration, Nvidia will utilize Azure’s scalable virtual machine instances to research and further accelerate advances in generative AI, a rapidly emerging area of AI in which foundational models like Megatron Turing NLG 530B are the basis for unsupervised, self-learning algorithms to create new text, code, digital images, video or audio," Nvidia said in a statement.
When the system comes online, customers will be able to deploy thousands of GPUs in a single cluster to train large language models, complex recommender systems, run generative AI models, and more. A date was not disclosed for when the supercomputer is expected to launch, but it will likely be installed in phases.
The two companies will also collaborate on DeepSpeed, Microsoft's deep learning optimization software, and other AI tools.
“AI is fueling the next wave of automation across enterprises and industrial computing, enabling organizations to do more with less as they navigate economic uncertainties,” said Scott Guthrie, executive vice president of the Cloud + AI Group at Microsoft. “Our collaboration with Nvidia unlocks the world’s most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure.”
Microsoft has increasingly pushed into the supercomputer space, winning a major UK government contract to build a huge system for the national weather office.
Along with hiring a number of staff from Cray and NERSC, it built a massive cloud-based AI supercomputer for OpenAI.