OpenAI’s latest release of Sora – a text-to-video diffusion model – turns users into wizards who can create magical, visual worlds, 20-second videos at 1080p, prompted through elegant creation tools.
The release is just the latest example of generative AI models upping the ante in terms of AI data center requirements, both in terms of performance and resource demands.
The US forecast for energy consumption on AI is alarming. Today’s AI queries require roughly 10x the electricity of traditional Google queries - a ChatGPT request runs 10x watt-hours versus a Google request. A typical CPU in a data center uses approximately 300 watts per hour (Electric Power Research Institute), while a Nvidia H100 GPU uses up to 700 watts per hour, a similar usage of an average household in the US per month.
Advancements in AI model capabilities, and greater use of parameters, continue to drive energy consumption higher. Much of this demand is centralized in data centers as companies like Amazon, Microsoft, Google, and Meta build more and more massive hyperscale facilities all over the country.
US data center electricity consumption is projected to grow 125 percent by 2030, using nine percent of all national electricity. As recently as 2018, our computers consumed only one to two percent of the global electricity supply – this rose to four to six percent by 2020, and by 2030 is projected to reach between eight to 21 percent. Data center power and carbon emissions doubled between 2017 and 2020. By 2028, AI is estimated to represent about 19 percent of data center power demand.
Even before ChatGPT exploded on the scene, data centers were facing significant growth in the performance demands of hardware for data processing, management, and storage. Today, with AI established as the world’s next major technology shift since it will touch all of our lives, rising usage makes the urgency greater as the power consumption of the underlying infrastructure increases at aggressive, unsustainable rates.
Given the dizzying proliferation of AI applications, how does the deep tech venture ecosystem aim to address the escalating challenges?
The reality is that enabling AI at scale – and at a sustainable basis – will require most hardware and software for the data to be redesigned or replaced, which will mean rethinking next-generation, AI data centers from the ground up.
One of the most significant drivers for this is due to the evolution of AI compute workloads. For those perhaps unfamiliar, computing for AI has three primary elements: pre-processing, training, and inference.
Data pre-processing involves organizing a large dataset before you can do anything with it, which may involve labeling, cleaning up, or structuring the data. Once the data is processed, training the AI can begin, akin to teaching it how to interpret the data. Inference becomes the primary task once the model is trained, during which the AI model runs in response to user queries.
In 2023, AI training made up the majority of AI compute by more than 2:1. But as AI models mature and move into wide deployment, inference will rapidly overtake AI in the majority of data center AI compute, and is estimated to outpace AI training by 6:1 by 2030.
Many of today’s data centers were built and optimized for traditional workloads, such as cloud, hybrid cloud, and SaaS database workloads which were addressed using dedicated servers. AI training maps somewhat well to such deployments, which typically involve allocations of dedicated servers. But as we shift to the AI workloads of tomorrow, we must rethink an architecture for cloud inference that can deliver much lower power and latency, including greater edge deployment that meets user requirements.
This will require a complete rethink of data center architecture to support efficient cloud-based inference, necessitating a major evolution in areas like the core ASIC chips, high bandwidth memory, interconnect, and the AI software ecosystem. We will see rapid advancements in new chip architectures, advanced networking, materials, packaging, and more.
With all the innovation needed, where will it come from, and are the large infrastructure players going to be disrupted?
While big tech companies certainly have the benefit of incumbency and funding advantage, the startup ecosystem will play an absolutely crucial role in driving the innovation necessary to enable the future of AI. Large public tech companies often have difficulty innovating at the same speed as smaller, more nimble startups. Answering to shareholders, they often seek to avoid disrupting their own franchise and rely on internal innovation, rather than development based on first principles and willingness to optimize at the foundational layer. If they lean on partnerships or acquisitions of early-stage companies to insource innovation, it's when teams are able to get around the NIH bias.
In the startup world, however, we are seeing the necessary innovation starting at the silicon level, being funded and built. Within data center interconnect, for example, Eliyan has created an optimized architecture that densifies networking, making bit transmission between compute and memory nodes more streamlined, less power-hungry, and less expensive. Eliyan’s technology is implemented both through standalone chiplets as well as IP blocks designed into partner silicon to fundamentally rearchitect the network at the systems level. Breaking through the I/O memory wall bottleneck, Eliyan’s approach delivers an overall 10x improvement in AI performance.
On the compute side, Recogni has developed a specialized generative AI system purpose-built for data center inference. The company’s instruction set architecture utilizes a cutting-edge logarithmic math approach that allows its system to turn multiplication matrix calculations into additions, cutting computational complexity and making its chips a lot smaller, faster, and more efficient. This approach is anticipated to be disruptive as compared to inferencing based on Nvidia GPU platforms.
These are just two examples of the growing number of emerging companies in this space as funding continues to flood in from investors, reaching over $55bn through Q3 in 2024. Of course, success is never guaranteed in the startup world, and certainly not in a sector undergoing such a flurry of competition and expansion.
What we know is that we can’t afford not to invest in the kind of innovation necessary to create a sustainable future for AI. The future of industry and our Earth depend on it.