IBM has unveiled a Distributed Deep Learning software library it says has demonstrated “a leap forward in deep learning performance.”
The software, available now in beta, aims to improve how deep-learning frameworks support scaling across multiple servers and use GPUs.
It’s all about the GPU
DDL shares deep learning tasks among 64 servers running up to 256 processors in total, in a way that avoids synchronization bottlenecks. In existing systems, IBM said, fast GPUs require frequent synchronization, meaning that processor speeds are limited to the rate at which data can travel between GPUs.
In a test, IBM used 64 of its own Power 8 servers, packing Intel chips and Nvidia GPUs. It then used clustering technology to manage traffic cop between the multiple processors in each server, and in the other servers.
The company deployed DDL to process 7.5 million images, assigning each into one or more of 22,000 categories.
DDL accurately recognized 33.8 percent of the objects after seven hours of training, beating the previous 29.8 percent record that set by Microsoft after 10 days of training.
It also beat a record set by Facebook AI research - IBM’s system achieved 95 percent scaling efficiency across 256 processors on the Caffe deep learning framework, up from Facebook’s 89 percent.
“The idea is to change the rate of how fast you can train a deep learning model and really boost that productivity,” Hillery Hunter, IBM fellow and director of systems acceleration and memory at IBM Research, told Fortune.
The company will also build the library into its PowerAI deep learning toolkit platform, which is available in both free and paid enterprise editions, as well as on the Nimbix Minsky Power Cloud. Sumit Gupta, VP of HPC and AI at IBM, told SiliconAngle: “We’ve democratized it and brought it to everyone through PowerAI.”