At its GPU Technology Conference (GTC) in Japan, Nvidia launched a new device for inference workloads - the Tesla T4.

Featuring 320 Turing Tensor Cores and 2,560 CUDA cores, the company claims the 75 watt card offers 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4. Along with the hardware, the company announced the Nvidia TensorRT Hyperscale Inference Platform for data centers, which uses T4 GPUs for real-time inferencing.

Train elsewhere

– Nvidia

“Our customers are racing toward a future where every product and service will be touched and improved by AI,” Ian Buck, VP and GM of Accelerated Business at Nvidia, said. “The Nvidia TensorRT Hyperscale Platform has been built to bring this to reality — faster and more efficiently than had been previously thought possible.”

Jordi Ribas, corporate VP for Bing and AI Products at Microsoft, added: “Using Nvidia GPUs in real-time inference workloads has improved Bing’s advanced search offerings, enabling us to reduce object detection latency for images. We look forward to working with Nvidia's next-generation inference hardware and software to expand the way people benefit from AI products and services.”

Chris Kleban, product manager at Google Cloud, also said that the company was "excited to support Nvidia's Turing Tesla T4 GPUs on Google Cloud Platform soon.”

Server manufacturers including Cisco, Dell EMC, Fujitsu, HPE, IBM, Oracle and Supermicro plan to release servers with T4 GPU on board.

Elsewhere at GTC

The Tokyo conference was also the setting for several other Nvidia announcements, many of them related to its autonomous vehicles initiatives. Among the announcements was the news that NTT Group plans to use Nvidia’s AI platform based on Tensor Core GPUs as the common platform for its company-wide “corevo” AI initiative, and that Fujifilm will use a DGX-2 system for AI research.