Nvidia has announced a new DGX class, the GH200, for generative AI workloads.
The DGX GH200 connects up to 256 Grace Hopper Superchips into a single 144TB GPU system. The superchip is itself a combination of Nvidia's Grace Arm CPU and Hopper GPU, connected by the NVLink C2C chip-to-chip interconnect. Those superchips are then connected by the new NVLink Switch System interconnect.
Together, the 256 superchips have 144 terabytes of shared memory. The system is also available in 32, 64, and 128-chip variants.
Alongside it, Nvidia plans to spin up a new supercomputer, the Helios, featuring four fully specced DGX GH200s - for a total of 1,024 Grace Hopper Superchips.
Nvidia said that Google Cloud, Meta, and Microsoft would be among those set to gain access to the DGX GH200 to explore its capabilities for generative AI workloads.
“Building advanced generative models requires innovative approaches to AI infrastructure,” said Mark Lohmeyer, vice president of Compute at Google Cloud.
“The new NVLink scale and shared memory of Grace Hopper Superchips address key bottlenecks in large-scale AI and we look forward to exploring its capabilities for Google Cloud and our generative AI initiatives.”
Traditional DGX deployments have paired two x86 CPUs with eight GPUs, but this system has a 1:1 ratio. "What that brings, beyond the memory footprint which is massive, is much more processing capability," Nvidia's VP and GM of DGX Systems, Charlie Boyle, told DCD.
"In an AI pipeline there are parts of it that are very highly parallelized GPU operations, but there's still always parts, whether it's part of data prep, or image transformation, things that you may need CPU resources for as well. And so having very strong CPU, directly connected to the GPU, a) improves processing, but b) means that some things in the pipeline, that maybe before you had to do on different systems, now can stay on one consistent system architecture and you can do your entire pipeline on it."
Girish Bablani, corporate VP of Azure Infrastructure at Microsoft, added: “Training large AI models is traditionally a resource- and time-intensive task. The potential for DGX GH200 to work with terabyte-sized datasets would allow developers to conduct advanced research at a larger scale and accelerated speeds.”
Despite the high-density compute, Boyle told DCD that the GH200 is still fully air-cooled. "That was a big system design consideration when talking to our customers," he said. "We know people will eventually have to move to liquid, but we also hear feedback from customers that that's challenging. For them, their data centers aren't there, they need to build new data centers."
Boyle added: "Even getting liquid-cooled equipment - because we're staying ahead of this, building our own stuff internally that does have liquid cooling so that we can test it for our customers - even getting liquid parts, the lead time on it is even much longer. One of our core design considerations for this generation was how do we still do it on air?"
Another request from customers was for the system to be useable out of the box. Boyle revealed that Nvidia will now use an integration facility to fully test everything and set it up so that it is ready to use as soon as it is installed.
He added that customers are asking for larger deployments than ever to support the huge demands of generative AI. "People used to buy a couple systems, test them, and then scale out a deployment," Boyle said. "[Now] my teams are getting calls from customers saying, 'When can you deliver me hundreds of systems?'"