US semiconductor startup Cerebras claims that it has trained the largest AI model on a single device.
The company trained AI models with 20 billion parameters on its Wafer Scale Engine 2 (WSE-2) chip, the world's largest chip.
The WSE-2 has 2.6 trillion transistors. Built on TSMC 7nm, it has 850,000 'AI optimized' cores, 40GB of on-chip SRAM memory, 20 petabytes of memory bandwidth, and 220 petabits of aggregate fabric bandwidth.
The WSE-2 chip is sold packaged with the Cerebras CS-2, a 15U box that also includes HPE’s SuperDome Flex. This combined system was used to train the models.
"Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company said in a blog post. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."
However, larger neural network systems are used by AI enterprises - they just use more than a single system to train them.
Cerebras raised $250 million late last year at a $4bn valuation. Supercomputing institutions Argonne, Lawrence Livermore, and PSC, as well as AstraZeneca, GSK, Tokyo Electron Devices, and oil and gas businesses are known to use the system.