One of the fascinating tidbits tucked away in the research paper announcing Google's Gemini large language model was that it was trained not just over multiple compute clusters, but over multiple data centers.
"TPUv4 accelerators are deployed in 'SuperPods' of 4,096 chips, each connected to a dedicated optical switch, which can dynamically reconfigure 4x4x4 chip cubes into arbitrary 3D torus topologies in around 10 seconds,” the paper states.
That optical switch is Google’s Mission Apollo, exclusively first profiled in May by DCD. Noam Mizrahi, CTO at chip company Marvell, sees Apollo as the first part of a much larger story: The move to a fully optical data center.
“As models get ever larger, the pressure moves to the interconnect of all of that - because if you think about that, each GPU or TPU today has terabits per second of bandwidth to talk to its peer GPUs/TPUs in a cluster, and your network is built for hundreds of gigabits,” Mizrahi said.
“Those are the connectivity points that you have, which means that - as long as you stay within your organic box like a DGX - you can communicate with those rates of terabits. But once you need to create clusters of 1,000s or so, you need to go through a much narrower port that becomes the bottleneck of the whole thing.”
The second challenge is how to create a network that efficiently “brings together tens of thousands of nodes that would appear as a single one, across an entire data center or data centers,” Mizrahi continued. “And I think the answer to all these things is just having significantly more optical types of connectivity to create the networks.”
In traditional network topologies, signals jump back and forth between electrical and optical. Moves like Google’s reduce the amount of those hops, but are still at a facility level. “The problem starts even lower, within a few racks. If you connect them together, it already could take you into the optical domain,” he said.
He hopes that systems will embrace optical as soon as possible: “Don't go back and forth between the digital and then the optical domain, just translate to optics and then run everything over optics and then only on the other side move back,” he said.
“So a GPU could have an optical port - that can be either an optical chiplet within the GPU, or pluggable - and it is connected into a network with an optical port. And then you have memory clusters, also with optics, and you have storage clusters, also with optics, and the network is all optics,” Mizrahi said.
This would allow “memory to scale at its own pace, because now it's also a bottleneck to compute the limit to how much you can connect (see next page). The storage will have to scale by itself, and the network, and then compute - everything independently.”
It’s a promising vision, that has many proponents. But it’s also one that has existed for some time, and has yet to lead to an all-optical revolution. The technology is still being developed, and what’s out there is expensive, even by data center standards.
“It’s a gradual thing, it will not happen in one day,” Mizrahi admitted. “No data center will actually be completely redesigned right now in order to do that. They’ll put some platform in, and then replace one portion of it. It will take time to evolve.”
This will also mean that it will take some time for the true benefit to be felt - as long as there are intermediary hops from optics to electrical it will still have inefficiencies.
“But at some point, we’ll have to do something else than the current approach, because you’ll hit a wall,” Mizrahi said. “And with generative AI you hit the walls very fast. It's very different to anything that we've seen so far."