Nvidia sets benchmarking performance records with its H200 and TensorRT-LLM software

Nvidia has set new MLPerf performance benchmarking records on its H200 Tensor Core GPU and TensorRT-LLM software.

MLPerf Inference is a benchmarking suite that measures inference performance across deep-learning use cases.

The latest version of the benchmarking suite – MLPerf v4 – has seen the addition of two new workloads that represent generative AI use cases: a large language model (LLM) benchmark based on Meta’s Llama 2 70B, and a text-to-image test based on Stable Diffusion XL.

Nvidia has set performance records on both new workloads, providing the highest performance across all MLPerf Inference workloads in the data center category.

The company’s TensorRT-LLM is an open-source software library developed to double the speed of inferencing LLMs on its H100 GPUs. Across the MLPerf v4 GPT-J test, the H100 GPUs using TensorRT-LLM achieved speedups of 2.4x and 2.9x in the offline and server scenarios, compared to the performance provided by the GPUs six months earlier during the v3.1 test.

For the MLPerf Llama 2 70B benchmarking test, Nvidia’sTensorRT-LLM running on the company’s H200 GPUs delivered up to 43 percent and 45 percent higher performance compared to the H100 in the server and offline scenarios, respectively, when configured to a 1,000W TDP.

The new benchmark uses the largest version of Llama 2, which has 70 billion parameters and is over ten times larger than the GPT-J LLM model that was used in previous benchmarking tests.

Regarding the Stable Diffusion XL text-to-image benchmarking test, an 8-GPU Nvidia HGX H200 system with GPUs configured to a 700W TDP, achieved a performance of 13.8 queries/second and 13.7 samples/second in the server and offline scenarios, respectively.

When the same test was run using a system containing eight Nvidia L40S GPUs, the system demonstrated performance of 4.9 queries/second and 5 samples/second in the server and offline scenarios, respectively.

Nvidia said that this was the best performance achieved by any hardware solution during the Stable Diffusion XL test.

Speaking ahead of the results being made public, Dave Salvator, director of product marketing with the Nvidia accelerated computing group, said that inference has become a big part of Nvidia’s data center activities and business, noting that in 2023 it made up about 40 percent of the company’s data center revenue.

However, he noted that is not just in the two new MLPerf tests that Nvidia has posted record performance outcomes.

“[Nvidia] submits on every workload because not only it is important to be able to deliver great performance on a single workload, it’s important to deliver great performance across as many workloads as you can,” he said. “[In the nine MLPerf tests] we continue to deliver leading results across all of those workloads, in all of those use cases.

“This is something that's really important because what it means is not only is our platform very, very performant, it's also very versatile, and that's something our customers really appreciate and value,” Salvator said.

Nvidia sets benchmarking performance records with its H200 and TensorRT-LLM software

More in IT Hardware & Semiconductors

Indosat partners with Nvidia for $200m AI center in Indonesia

Chip company MIPS poaches two SiFive execs for leadership team

Episode Modern DCIM as a force for more resilient, secure, and sustainable IT

More in Infrastructure Management

Simplify the management of complex data center and distributed IT environments

Sponsored Cadence Reality Digital Twin platform to transform data center design for the AI era

Discussion DCD>Debate: Are data center managers prepared for the next era of facilities management?

Tags

Unlocking data center profitability: A guide to DCIM solutions

The make vs. buy decision for data center infrastructure management software – A clear choice

2023 Data Center Market Trends: Hong Kong Asia's Connectivity Hub

Emerging Energy Storage Technologies