Meta, the company formerly known as Facebook, has developed a huge artificial intelligence supercomputer which it says will become the fastest AI system in the world once fully built out in mid-2022.
The AI Research SuperCluster (RSC) is currently being used to train large models in natural language processing (NLP) and computer vision for research. The company said it hoped to "one day" train models with trillions of parameters and build new AI systems that can power real-time voice translations to large groups of people.
Meta said that development of the supercomputer was delayed by remote working and chip and component supply chain constraints, both caused by the Covid-19 pandemic.
Back in 2017, Meta's Facebook AI Research lab built a supercomputer with 22,000 Nvidia V100 Tensor Core GPUs in a single cluster. Performing 35,000 training jobs a day, it served as the company's main AI supercomputer.
But in 2020, Facebook decided to increase its computing power, building a new supercomputer to handle more advanced AI workloads. The current RSC system is comprised of 760 Nvidia DGX A100 systems, each of which includes eight A100 GPUs and two CPUs (Meta did not confirm the vendor, but the standard DGX has two 64-core AMD CPUs).
The 6,080 GPUs are connected via an Nvidia Quantum 200 Gb/s InfiniBand two-level Clos fabric. The system has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.
Compared with Meta’s previous system, the RSC runs computer vision workflows up to 20 times faster, runs the Nvidia Collective Communication Library (NCCL) more than nine times faster, and trains large scale NLP models three times faster, according to internal - and unverified - benchmarks.
Meta said that a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before.
The company is still building out the supercomputer, ultimately expecting to connect 16,000 GPUs as endpoints. Meta has designed a caching and storage system that can serve 16 TB/s of training data, and plans to scale it up to one exabyte.
The final system is predicted to be capable of five exaflops of mixed precision compute.
Depending on the benchmark, the current world's fastest AI supercomputer is the Department of Energy's Perlmutter supercomputer. Capable of four exaflops of AI performance, it features 6,159 Nvidia A100 GPUs and 1,536 AMD Epyc CPUs.
Italy's Leonardo system, which features 3,500 Intel Sapphire Rapids CPUs and 14,000 GPUs, is set to overtake Perlmutter when it launches soon.
Later this year, the US expects to launch two systems capable of more than an exaflops of performance - under the LINPACK benchmark, not the AI benchmark used by Meta.
The first, Frontier, is expected to be capable of more than 1.5 exaflops, and will feature 9,000 AMD Epyc CPUs and 36,000 AMD Radeon Instinct MI200 GPUs.
It will be followed by Aurora, an oft-delayed system that could exceed 2 exaflops. It will boast 18,000 Intel Xeon Sapphire Rapids CPUs, and 54,000 Intel Xe GPUs.
However, China is believed to have secretly launched two exascale supercomputers last year.