"The story of Arm in the data center isn’t new, that’s not an important detail. The real story is that we've hit a tipping point where the cloud requires something new because of the way it runs, and it needs to be efficient. Nobody who’s doing x86 is building that. It just so happens that we’re Arm-based, but we are building it.”
Over a wide-ranging discussion in London, Ampere Computing's chief product officer Jeff Wittich gave DCD a detailed run-through of the company’s technology roadmap, and why he thinks the five-year-old business can dominate the cloud CPU market.
This feature appeared in the latest issue of the DCD Magazine. Read it for free today.
Arm’s multiple attempts to break into the data center may not be new, but quickly revisiting the tale provides valuable context.
You had the early failures that just weren't ready," Wittich recalls. "The Calxeda chips, the Applied Micro stuff, but the model was wrong, people thought that you could take small cell phone chips and if you scaled them enough out, you would be you would have enough performance. The problem is, is that there is a minimum bar for performance per core, it's not zero."
But they laid the groundwork, and got the ball rolling on a software ecosystem.
Next came the middle phase, where "a bunch of people that had the wrong approach and or got out too early, when it got hard, because it was a side bet as part of a differentiated business."
One such example is Qualcomm's Arm server processor line. Its Centriq 2400 chip was well received, and may have proved successful in the long run. But then the company got into a legal fight with its largest customer, Apple, and spent most of 2018 fighting off a hostile takeover attempt by Broadcom.
Bruised and in need of reassuring investors with cost cutting, it laid off hundreds from its data center division and killed the project. Ampere and Microsoft hired many of those let go.
But now, after years of failures, the stars finally seem aligned. Amazon Web Services released its Graviton series of chips, while the fastest supercomputer of 2021 uses Fujitsu's Arm A64FX processor.
"The software ecosystem is there, so stuff is either at parity or in some cases even a little bit ahead of where the x86 stuff is," says Wittich. "The time is now."
Of course, once you have come to the conclusion that Arm is inevitable due to its energy efficiency and flexibility over x86, the next question has to be which form of Arm.
With AWS several generations into its own processor family, Microsoft known to be working on its own Arm chips, and Google likely cooking up something, what space is there for Ampere?
"They're big companies,” Wittich admits. “But our customers are a blend of all those companies, so we can get to massive scale. It’s also useful from a network effect perspective. As an end user, is it possible to be running on Graviton and the Ampere Altra processor across two clouds? Sure.
"But it's a level of complexity that doesn't need to exist when we can come in and run the same processor at Microsoft, at Google, or on an HPE box.”
Still, if a hyperscaler-made processor was significantly better the lack of network effort would not dissuade users. For Ampere to succeed, it needs to show a clear advantage, Wittich admits.
That's where its big bet on cores comes in.
UK-based Arm licenses out its eponymous instruction set architecture to chip designers. It also licenses out processor core technologies, with the latest being Neoverse.
That's what AWS relies on for Graviton. And it's what Ampere used to rely on.
"On Altra & Altra Max, we decided that we couldn’t sit around for five years building a core,” he says. “We said ‘let's get something out because otherwise we won't have customers, we won’t have feedback, and we won’t have an ecosystem.
“Going forward, with what we're doing with our cores, it looks a lot different. And it starts to really deliver huge density rate power efficiency, while delivering the type of performance that the cloud wants.
"I just don't know that we'll see that from Arm [cores].”
Ampere's view is that Arm's own cores will never put the data center first. "Arm develops the cores for the client product first, and then they adapt them to infrastructure cores a few months later. But at their heart, they were still developed for a different market."
That means a lot of other approaches to Arm server CPUs are flawed, Wittich argues. "It's one of the fundamental problems with a bunch of the CPU models today, they have features that make complete sense for a client processor but make no sense in the cloud."
Its cores, it argues, are targeted directly at the cloud. Even traditional high-performance computing is too tangential a market. "I'm not focused on that, you would make a different type of core."
There is one other market that Ampere is targeting beyond cloud and on-prem cloud, though: The high-end Edge.
"The self-driving car company Cruise uses us in their vehicles,” Wittich reveals. “This isn't us getting into the automotive space, more that they needed a really high performance Edge server that would sit in the car. They couldn't actually find any other CPUs that within 100 watts that gives a reasonable performance, and our 64 core chip consumes 70 watts."
But the company hopes that cracking the automotive space will tie back to the cloud. "There's a bunch of other smaller Arm devices sitting everywhere in the vehicle, but a lot of the developers are just doing that stuff on x86 machines in the cloud, and then moving it over to the car. You're constantly porting back and forth. And it's a waste.”
Building its own cores also reduces its licensing fees to Arm, and insulates it from the chip designer's ups and downs. In late 2020, Nvidia announced it would acquire Arm for $40 billion but, after two years of distracting regulatory investigations, the deal collapsed.
Arm soon appointed a new CEO, who announced layoffs and plans for an IPO. "I didn't have to worry that much during all this stuff over the last two years," says Wittich. "We have an architectural license, and we can build what we want to build."
But Ampere is not alone in that, with others developing their own cores.
On the consumer side, Apple took the approach for its M1 Arm chips, first launched in 2020. On the data center side, things get a little more complex.
In 2018, three Apple veterans launched Nuvia to build their own Arm server CPU with their own cores. Just a year later, Apple sued one of the founders claiming that he solely worked at Nuvia while still employed by Apple.
In 2021, it seemed like that Nuvia had put the controversy behind it, selling to Qualcomm for a respectable $1.4 billion. Curiously, it seemed like it also put the server chip behind it, with Qualcomm announcing that it would use the tech in mobile, IoT, and networking products, integrating it into Snapdragon.
A year later, it pivoted back, shopping around a server product. Over the summer, Bloomberg reported that Amazon agreed to look at the chip.
Yet more confusion soon followed. In September, Arm sued Qualcomm - one of its largest partners - claiming that it did not agree to Qualcomm’s use of Nuvia’s licenses, and terminated the licenses in February.
Should it win the case, it could unwind a major acquisition for Qualcomm, and wreck its desktop and server chip plans. Even if the case is ultimately settled, it will delay and distract Nuvia - and it's hard to have faith in Qualcomm's management to maintain focus.
Wittich is diplomatic in his views on Nuvia. "We have our own cores, and that gives us a five-year lead over anyone who decides it might be time to start designing their own cores. Now, that's a big differentiator."
Another factor it hopes will give it a lead over Arm and non-Arm processors is its own chip-to-chip interconnect, which will let it go to a chiplet approach, where tiny dies are used instead of one monolithic die.
“Our first two products went monolithic, because it’s critical that our performance is really, really consistent,” he said. “We don't want any variability across the chip, where if you got placed in one core versus another core, the performance looks a lot different.
"We wanted to avoid any bottlenecks. A lot of the chiplet approaches to date have big bottlenecks, because there are too many hops and the latency is still too large from chip to chip.
“Our chiplets interconnect is done in such a way that we remove a lot of the common bottlenecks that occur in a chiplet-based approach,” he claims.
The company was able to get 128 cores into a single die; it plans several hundred as it goes chiplet. “We want to make sure that as you bring more and more cores online, that the performance per core doesn't go down. And that's not really the case with a lot of legacy x86 CPUs.”
That’s why this is not a story about Arm, he argues. “This isn't the old days of the Arm chips that come in and just undercut everybody on price, we're not the lowest price. But what we are is for the highest performance processor, and we're the most power efficient processor.”
With x86, “the problem with it is that you're getting into a space where that additional performance gets really power inefficient,” Wittich says. “So you're adding 20 percent more power to get 10 percent more performance.”
At the rack and data center level, that doesn’t make sense, he says. “Each chip looks like it's delivering more performance, but overall you've just reduced your overall capacity. For no reason.”
Still, while he eyes those x86 workloads as land to conquer, Wittich notes that he’s aware of where the Ampere chips’ limits are.
“Trying to make everything one size fits all is a disaster. We do awesome at inferencing on a CPU, but if you’ve got batch inference jobs that you're gonna plow through over the next 12 hours, maybe move that stuff off to an inference accelerator.”
“I don't think we're going back to a market where you've got one CPU that's deployed at 99 percent of the servers out there, we're not going back,” says Wittich, who was lured to the company after 15 years at Intel watching that market share fall. “The world's changed now.”