US semiconductor startup Cerebras claims that it has trained the largest AI model on a single device.
The company trained AI models with 20 billion parameters on its Wafer Scale Engine 2 (WSE-2) chip, the world's largest chip.
The WSE-2 has 2.6 trillion transistors. Built on TSMC 7nm, it has 850,000 'AI optimized' cores, 40GB of on-chip SRAM memory, 20 petabytes of memory bandwidth, and 220 petabits of aggregate fabric bandwidth.
The WSE-2 chip is sold packaged with the Cerebras CS-2, a 15U box that also includes HPE’s SuperDome Flex. This combined system was used to train the models.
"Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company said in a blog post. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."
However, larger neural network systems are used by AI enterprises - they just use more than a single system to train them.
Cerebras raised $250 million late last year at a $4bn valuation. Supercomputing institutions Argonne, Lawrence Livermore, and PSC, as well as AstraZeneca, GSK, Tokyo Electron Devices, and oil and gas businesses are known to use the system.
More in IT Hardware
Conference Session IT efficiency: the critical core of digital sustainability