Data centers have changed. When we say that, we don’t just mean a little bit – we mean a lot. If you’ve worked in the industry for any length of time, you’ll know that facilities have moved from primarily ‘north-south’ traffic, where the data is simply moved to its next destination, to ‘east-west’ traffic, where data is parsed, sorted, and complex calculations performed to get it to where it needs to go, in the form it’s needed.
Just as we’ve all got our heads around that change, along comes AI to change the shape of the envelope again. AI has significant new requirements in terms of data center compute, speed, and latency. To meet those demands, having the right transceivers and interconnects is essential.
Dave Hessong is manager of global data center market development for Corning, a company with a long record in providing cutting-edge fiber optics for data centers. For him, the secret sauce in making your data center ready for modern workloads lies in preparing for higher-speed connections like 800G – the term used to refer to a throughput capacity of 800Gbps. This sounds complicated and expensive – but as he tells us, it could be as simple as reconfiguring and repurposing some of your existing cabling. This boost in speed and reduction in latency could transform your data center into an AI-ready powerhouse.
“Many modern GPU servers have 80Gbps interfaces, though with the caveat that they use them as dual port 400Gbps. They're not using them as 800Gbps transceivers. The reason is, that if you go in a spine-and-leaf network, every GPU in a GPU server has to connect to a leaf switch. The switch is a 32-port switch but acts like a 64-port switch. Half the ports connect to the GPUs and the other half connect to the spine switch.”
What's the hurry?
But why does latency matter so much? Hessong explains: “The lower the latency, the higher the performance of the network. If you ever want true self-driving cars, where a system knows where all the cars are, then latency could be the difference between something bad happening and not. More generally, in a competitive environment, the market will choose which product it likes best, and which is most effective. If your AI model has higher latency, your model won't perform as well.”
This is where planning your cable infrastructure can pay big dividends because a modern topology can be both faster and more predictable. “Traditional three-tier architecture had gone away before this whole AI buzz, with things like virtualization. The problem with three-tier architecture is sometimes it goes through two layers of switching, and sometimes it goes through three layers of switching. You have more latency, but it's also unpredictable latency. When you go to spine and leaf, it's extremely redundant, because every leaf switch connects to every spine switch. But it also has a very predictable path.”
Get connected
The interconnects are at the heart of the performance of any data center. Having the right fiber optic cables, with the right connectors, in the right cabling architecture can turn an average data center into a behemoth. But more power means more cabling which, in turn, means having a planned, intelligent, structured approach.
“The way AI is deployed now means you have a second back-end network for these big GPU clusters. It has created a lot of fiber density, and in turn, new products are created to address the additional density. That requires a lot of extra cabling,” says Hessong. “A lot of our new products resulted from trying initial deployments. We've learned along with the customers and come up with new products in that space.”
Corning recommends that data centers look toward an 800G spine-and-leaf topology, which has been adopted widely in the hyperscale market but is still in its infancy in the wider industry. This configuration, which Hessong says ”takes advantage of the fastest components on the market, giving the highest-performing network” allows facilities to meet today's demands while reducing latency for those east-west connections.
Hessong points to why there’s a need for change: ”East-west is where the computational stuff is happening within the data center, running calculations, building the page, running algorithms to give you targeted advertising. All of a sudden, east-west traffic significantly dwarfs north-south traffic. Now that's even more exacerbated by some of these AI models. Take ChatGPT – there is a ton of computational stuff that happens within the data center.”
Action time is now
You may be reading this and thinking it is a problem for tomorrow. But, Corning warns that AI has caused demand for higher speeds, and lower latency to ramp up at an almost exponential rate: “AI is a huge area of growth, but bandwidth everywhere is growing. We've put more resources and a whole lot of capital behind capacity expansions, and product development to support these new applications.”
Hessong continues, “Current studies estimate that 40 percent of all the transceivers in the market, on a $1 basis, will be going into AI applications in the next few years. We're starting this exponential ramp of what it's going to be like, there's much AI capacity being deployed now.”
While the switch from duplex or parallel 400G to 800G is, structurally speaking, pretty easy, many data centers may not yet be running even at 400G. It’s these facilities that need to think big for the next upgrade: “If you're at 100Gbps today, you want to make sure your network can handle 800Gbps and beyond.
"Then, when you go to these high densities, some structure becomes even more important. Because there's so much cabling and fiber, it has to be organized and labeled. If you want to reconfigure your network, it has to be built in such a way as to allow for reconfiguration. And that's the crux of this whole structured cabling approach.”
So, you reconfigure your fiber network into a structured spine-and-leaf deployment and you get a huge performance boost. It’s a no-brainer. Well… not quite. As ever, increased workloads come with a familiar caveat – the need for more power. Just because you can physically reconfigure your data center for the AI era, doesn’t mean that your current power draw will be sufficient:
“There's a massive power problem. That's where a lot of the challenges are coming up. Typically, a rack of 20 kilowatts was considered a high-density rack. GPU server racks are over 40 kilowatts. There is a power problem that has to be solved along with the cabling challenges. An AI data center could be five to six times the power compared to what you were traditionally trying to do. Your AI clusters may even need to sit somewhere else, not in your existing data center because it would require you to re-architect the power infrastructure,” Hessong tells us.
Keep your friends close
That’s why choosing the right partnership for your AI upgrade is important, and Corning is keen to show that means them. “We've been doing it for multiple years, we've come up with some cool innovative products to make the deployment easier,” enthuses Hessong. “We have an engineering services team which can design the fiber infrastructure for them. Now these installations have become more complex, we'll help you design them. But we'll also help train contractors to ensure the installation goes smoothly.”
But what if you go to 800G and realize you are approaching capacity? The beauty of a spine-and-leaf arrangement is that you simply add another spine, and increase your capacity by factors. Adding breakout ports only increases this further.
“The size of your network is limited to how many ports are on your spine switch, or conversely, how many ports are on the leaf switch; but once you use port breakout, you can either double or quadruple the available ports or even higher.”
Given the rate of change and growth in AI, industry stalwarts may be reading this and wondering if 800G is just the start and if there’ll soon be articles talking about the next big jump. Not even Corning denies this, but Hessong points out that this is entirely consistent with the industry. However, by moving to 800G now, you can be future-proof for at least the next three to seven years, the average lifecycle of a topology array depending on your technology refresh cycle.
“Bandwidth continues to grow. It means more fiber in more places. Honestly, AI is in its infancy. We can't yet know the full effect or the precise rate at which AI will change business or affect society at large. But we are completely convinced the consequences are extraordinary. “
“The impact may be as transformational as major technological innovations, like the printing press, the steam engine, electricity, computing, and the internet. It's made humongous progress. It is going to change many industries such as banking, manufacturing, and healthcare. As for what’s next, the bandwidth will continue to grow, and AI will continue to evolve.”
The day after tomorrow
Hessong adds, “Sometime next year, that technology will be 1.6Tbps, and then you'll see AI clusters being built with 1.6Tbps presented as two ports of 800Gbps.”
That said, it’s worth remembering that, as we’ve already said, only hyperscalers are widespread users of 800G, so the arrival of 1.6Tbps will first show up at those companies followed by more widespread adoption several years later.
As we mull over Hessong’s predictions of AI transformation, we find ourselves wondering, “Who is going to regulate all this?” With a wry smile, he tells us, “You could write a whole long article on how it should be regulated, and who should regulate it. But that's above our pay grade.”
Interested in making the jump to 800G or beyond? Corning provides free design support and Bill of Materials build-out. Contact Corning today to learn more! Click Here
-
How fiber-rich interconnects help data centers meet the processing demands of AI
Data center operators must plan for a new processing model in order to maximize potential of AI
-
Connectivity: From land, to sea, to space
An eBook examining the crucial role of connectivity - no matter where we are
-
Sponsored How optical innovation is helping data centers prepare for artificial intelligence
Meeting the requirements of an AI enabled data center