At its annual GPU Technology Conference Nvidia unveiled its successor to the DGX-1 system, the aptly named DGX-2.
Featuring twelve of the company’s newly announced interconnect fabric, NVSwitch, it is comprised of 16 V100 GPUs split across two server boards, along with two Intel Xeon Platinum CPUs, 1.5 terabytes of system memory and 30TB of NVMe SSDs.
The system weighs 350lb, stretches 10U and consumes 10kW.
Ready to go
Jim McHugh, Nvidia VP and GM, said in a pre-brief attended by DCD: “We’ve had a lot of customers explain to us that they’re doing bigger and bigger clusters, and one of the most amazing things about DGX-2 is that it’s incredibly flexible.
“But we realize not everybody will be able to take advantage of all 16 GPUs all the time, so there is full KBM support, you can segment it down to 1, 2, 4, 8.”
Ian Buck, GM of Nvidia’s data center business, added: “DGX-2 is an amazing AI server. We have been optimizing our entire stack to scale up AI.
“We’ve taken a neural network called FairSeq used for translating the Internet. Previously on our DGX-1, it was measured to take about 15 days to train. Through all the advancements we’ve made, we’ve taken that down to just one and a half days.”
“Just to put that in perspective - it would take about 300 Skylake servers to deliver that same level of performance.”
In a keynote speech at GTC 2018, CEO Jensen Huang said: “This is 10x faster than DGX-1, it took hundreds of millions of dollars of engineering.”
The system will retail for $399,000, starting in the third quarter. “This is what an engineer finds beautiful, you guys, this is sexy,” Huang said.
The CEO confirmed that DGX-2 will be installed at its Saturn V supercomputer - currently a data center consisting of 660 DGX-1 server nodes.
Nvidia declined to provide a roadmap for that upgrade.