The United Kingdom Atomic Energy Authority (UKAEA) and Cambridge University are building a digital twin of a planned fusion reactor, to speed the development of carbon-free energy.
UKAEA and Cambridge's Open Zettascale Lab will create a simulation of the Spherical Tokamak for Energy Production (STEP) prototype fusion power plant, which is scheduled to create a "burning plasma" by 2035, and net electricity production by 2040. The project will use Intel's Max GPUs, and PowerEdge servers from Dell.
Fusion energy could provide near limitless sustainable power, by mimicking conditions in the sun, where light atoms "fuse" into heavier ones, releasing energy. Scientists believe that industrial fusion energy is still years or decades away, and is often heralded as the ultimate alternative power source, but a functioning fusion power plant that can deliver energy to the grid is still years away.
STEP research will take place at Culham, the UKAEA lab which has already hosted JET (the Joint European Torus fusion project) for 40 years, and taken part in moves to produce a "spherical tokamak" and the MAST upgrade project which looked at increasingly practical issues such as removing exhaust gas and energy from the reactor.
If STEP is successful, an eventual reactor will be built in West Burton on the site of a newly-decommissioned coal-fired power station.
That timescale puts pressure on the engineering, said Rob Akers, director of computing programs, UKAEA, in a briefing: "It is effectively a moonshot program to prove that fusion can be economically viable. So STEP aims to put fusion energy onto the national grid in the early 2040s. And a very important part of that mission is to develop and nurture the supply chain that will design and construct the world's first fusion power plants. The challenge for us is that there is insufficient time to do engineering the way we've been doing it now for decades. We've got 17 years to stand up STEP and plug it into the grid."
Instead of building and testing physical prototypes, fusion engineers will need simulations: "In the same way, that the aerospace sector has moved wind tunnels into the world of computational fluid dynamics, or the automotive sector has moved the process of crash testing into the virtual world using finite elements... we need to do the same for designing fusion power plants."
The problem is that fusion reactors involve many processes that are more complex and hard to model than the other examples: "A fusion reactor is an incredibly complex, strongly coupled system and the models that underpin the operation of these fusion power plants are somewhat limited in their accuracy. There are many coupling mechanisms that we have to take into account. There's a lot of physics that spans the entire load assembly of the machine, from structural forces to thermal heat loads through the power plants, to electromagnetism, and to radiation."
He went on: "A single change to a subsystem can have huge ramifications across the entire plant. It means that we're going to have to worry a lot about emergent behavior that will only become apparent when we construct the plant. And we need to try and simulate that in advance."
"We have to simulate everything, everywhere, all at once."
Computationally, this is complex, said Paul Calleja, director of research computing services, University of Cambridge: "It's a really tricky coupled multi-physics multi-timescale problem. A lot of simulations focus on quite defined timescales, but we have very long timescales, with many different types of physics all being coupled."
Exascale computers are expensive, said Calleja: "They cost north of £600 million pounds ($760m) of capital to deploy, and consume north of 20MW of power- so it costs £50 million pounds ($63m) a year just to plug them in."
UKAEA will use the Cambridge Open Zettascale lab's supercomputer (equipped with 4th gen Intel Xeon processors) to model the physical and engineering issues in STEP, potentially solving problems in the "digital twin" before they crop up in the real build. The project partners label this as the "industrial metaverse", though it is an advanced simulation project with no VR headsets in sight. The project will also use Intel's DAOS object store to handle the hundreds of petabytes of data which will be zapped out very quickly when a single plasma turbulence incident is simulated.
UKAEA and the Open Zettascale Lab said they have picked Intel's recently launched GPU Max (Ponte Vecchio) coprocessors, in preference to the dominant GPU offerings from Nvidia, partly because Intel supports the one API interface, which can deploy a single codebase across multiple architectures, apparently including Nvidia systems. It has been touted as a competitor to Nvidia's CUDA, the market-leading language and API for supercomputers built from GPUs. CUDA does run on ARM GPUs, but does not run on Intel chips.
Open architecture
"How do you program for a GPU world where you're not locked into a single vendor solution?" said Calleja. "Because we might work with Intel today, but who knows what's going to happen in the future? We don't want our codes to be locked into a particular vendor " The Intel one API environment is largely on the SYCL cross-platform abstraction layer, he said. "This one API SYCL environment gives us a really nice way to develop codes that if we wish we can run on Intel GPUs. We can also run those codes on Nvidia GPUs, and even AMD GPUs with minimal recode."
One API may offer portability but the project partners will have to work to get the apps they want onto it. Applications currently supported include AI, deep learning, and molecular dynamics; the Open Zettascale Lab plans to expand this to the areas they need: engineering, fusion materials, and plasma simulation.
Answering a question from DCD, Calleja said: "Obviously, we do work with Nvidia. But in this collaboration, we really wanted to look at a completely open ecosystem that presents a much more compelling case for democratization. We're not locked into a proprietary program and environment. So I think the fact that Intel now has a competitive GPU product, which we can unlock with one API, that's actually quite compelling to get some competition back into the ecosystem."
The project plans to share the simulations and the simulation techniques it develops, explained Calleja and Akers, so an open API is very important for the international collaboration they envisage.
Asked about the efficiency of the supercomputing hardware the project will use, Calleja said that the main focus on supercomputing efficiency will be on software: "The gains we get from hardware in terms of output per megawatt, will be dwarfed by the gains we get from software. It's really developing the software to exploit these systems where we get huge gains in productivity."
Meanwhile, it's worth mentioning that, while projects like STEP share results so the world can build collectively towards fusion power, some optimistic VC-funded startups claim they can do it sooner by working on their own. One startup, Helion, has apparently signed a deal to provide Microsoft with 50MW fusion-powered electricity by 2028, but most observers regard this as hype.