At its GPU Technology Conference, Nvidia announced several partnerships and launched updates to its software platforms that it claims will expand the potential inference market to 30 million hyperscale servers worldwide.

Foremost among them was a new version of its TensorRT inference software, with integration of the deep learning inference optimizer and runtime into TensorFlow, Google’s open source machine learning framework.

Let it flow

Nvidia CEO Jensen Huang
Nvidia CEO Jensen Huang – Sebastian Moss

“GPU acceleration for production deep learning inference enables even the largest neural networks to be run in real time and at the lowest cost,” Ian Buck, Nvidia’s GM of data center business, said.

“With rapidly expanding support for more intelligent applications and frameworks, we can now improve the quality of deep learning and help reduce the cost for 30 million hyperscale servers.”

Rajat Monga, engineering director at Google, added: “The TensorFlow team is collaborating very closely with Nvidia to bring the best performance possible on Nvidia GPUs to the deep learning community.

“TensorFlow’s integration with Nvidia TensorRT now delivers up to 8x higher inference throughput (compared to regular GPU execution within a low-latency target) on Nvidia deep learning platforms with Volta Tensor Core technology, enabling the highest performance for GPU inference within TensorFlow.” 

Nvidia claims it TensorRT 4 software can cut data center costs by up to 70 percent, based on a workload mix representative of a major cloud service provider.

Markus Noga, head of machine learning at SAP, said: “In our evaluation of TensorRT running our deep learning-based recommendation application on Nvidia Tesla V100 GPUs, we experienced a 45x increase in inference speed and throughput compared with a CPU-based platform. We believe TensorRT could dramatically improve productivity for our enterprise customers.”

The company also introduced GPU acceleration for another open source project that originated at Google, the containerized workload management platform Kubernetes. The company said it will contribute GPU enhancements to the open source community.

Elsewhere, popular speech recognition framework Kaldi has been optimized for GPUs, Nvidia said. The chipmaker added that it was working closely with Amazon, Facebook and Microsoft to help developers using ONNX frameworks like Caffe 2, Chainer, CNTK, MXNet and Pytorch easily deploy to Nvidia deep learning platforms.