OpenAI today released GPT-4, the next generation of its large language model, months after its ChatGPT chatbot upended the artificial intelligence industry.

But for those hoping that the company would live up to its original promise as an open AI organization, the announcement was a disappointment. "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar," the technical report warned.

Microsoft Cloud
– Sebastian Moss

While a lot of details about the model are shrouded in mystery, what is known is that OpenAI uses the cloud service of investor Microsoft to train its systems.

That originally involved the development of custom and specialized hardware in Microsoft's Azure data centers, but Microsoft now plans to roll out what it has learned to the wider AI market.

“Co-designing supercomputers with Azure has been crucial for scaling our demanding AI training needs, making our research and alignment work on systems like ChatGPT possible,” Greg Brockman, president and co-founder of OpenAI, said in a Microsoft blog post.

Microsoft deployed tens of thousands of co-located Nvidia GPUs connected to each other on a high-throughput, low-latency InfiniBand network, at an unprecedented scale. The company pushed beyond the limits the suppliers of the GPUs and networking equipment had ever tested, pushing into uncharted territory.

“Because these jobs span thousands of GPUs, you need to make sure you have reliable infrastructure, and then you need to have the network in the backend so you can communicate faster and be able to do that for weeks on end,” Nidhi Chappell, Microsoft head of product for Azure high-performance computing and AI, said.

“This is not something that you just buy a whole bunch of GPUs, hook them together and they’ll start working together. There is a lot of system-level optimization to get the best performance, and that comes with a lot of experience over many generations.”

The experience of building the infrastructure for OpenAI has been rolled into the wider Azure AI service, the company said.

“We saw that we would need to build special-purpose clusters focusing on enabling large training workloads, and OpenAI was one of the early proof points for that,” Eric Boyd, Microsoft corporate vice president for AI Platform, explained. “We worked closely with them to learn what are the key things they were looking for as they built out their training environments and what were the key things they need.”

The company this week introduced the ND H100 v5 VM which enables on-demand in sizes ranging from eight to thousands of Nvidia H100 GPUs interconnected by Nvidia Quantum-2 InfiniBand networking.

“Having the early feedback lessons from the people who are pushing the envelope and are on the cutting edge of this gives us a lot of insight and head start into what is going to be needed as this infrastructure moves forward,” Boyd said.

Subscribe to our daily newsletters