Glenn K. Lockwood has left the Department of Energy's National Energy Research Scientific Computing Center (NERSC) for a role at Microsoft.
Lockwood is the latest in a string of departures from the legacy supercomputing sector to move to Microsoft. In 2020, the CTO of Cray moved to Microsoft. A year later, the co-lead of Cray's PathForward program, Dr. Dan Ernst, also joined Microsoft.
The company has won a number of cloud supercomputing contracts, including a $1.56 billion deal to provide a system for the UK Met Office. This week, Meta said that it would deploy a dedicated Microsoft Azure cluster for artificial intelligence research.
Glenn K. Lockwood is a storage architect who specializes in I/O performance analysis, extreme-scale storage architectures, and emerging I/O technologies. He led the deployment of the world's first 35 petabyte all-flash Lustre file system for NERSC's 70.9 petaflops Perlmutter system (built by Cray).
In a blog post, Lockwood explained why he left the DOE for Microsoft.
"On one hand, HPC's future has never been brighter thanks to how much life (and money!) the AI industry is bringing to the development of HPC technologies," he said. "On the other hand, leadership HPC appears to be engaging in unsustainable brinkmanship while midrange HPC is having its value completely undercut by cloud vendors."
Without a major breakthrough in transistor technology, the only way to continue to build more powerful supercomputers will be to pump more power into data centers and dissipate more and more heat.
"At the current trajectory, the cost of building a new data center and extensive power and cooling infrastructure for every new leadership supercomputer is going to become prohibitive very soon," Lockwood argued.
"My guess is that all the 50-60MW data centers being built for the exascale supercomputers will be the last of their kind, and that there will be no public appetite to keep doubling down."
As for the slightly less powerful systems like Perlmutter, they are slowly being rendered less and less necessary as the cloud catches up. "You can stick a full Cray EX system, identical to what you might find at NERSC or OLCF, inside Azure nowadays and avoid that whole burdensome mess of building out a 50MW data center," he said.
"You can also integrate such a system with all the rich infrastructure features the cloud has to offer like triggered functions. And when it comes to being first to market for risky HPC hardware, the cloud has already caught up in many ways - Microsoft deployed AMD Milan-X CPUs in their data centers before any HPC shop did, and more recently, Microsoft invested in AMD MI-200 GPUs before Frontier had a chance to shake them out."
However, he added: "I don't claim to know the future, and a lot of what I've laid out is all speculative at best. NERSC, ALCF, or OLCF very well may build another round of data centers to keep the DOE HPC party going for another decade. However, there's no denying that the stakes keep getting higher with every passing year.
"That all said, DOE has pulled off stranger things in the past, and it still has a bunch of talented people to make the best of whatever the future holds."