China’s Sunway TaihuLight supercomputer has been declared the world’s most powerful computer, topping previous record holder the Tianhe-2, and making China the nation with the most HPCs for the first time in history.

Significantly, the computer only uses Chinese-designed chips, reducing the country’s reliance on US technology and marking a significant achievement for the country.

Sunway TaihuLight supercomputer
Sunway TaihuLight supercomputer – Top500

Making China great again

The biannual Top500 report named Sunway TaihuLight the most powerful, with Professor Jack Dongarra saying: “The Sunway TaihuLight System is very impressive with over 10 million cores and a peak performance of 125 Pflop/s.

“The Sunway TaihuLight is almost three times (2.75 times) as fast and three times as efficient as the system it displaces in the number one spot. The HPL Benchmark results at 93 Pflop/s or 74 percent of theoretical peak performance is also impressive, with an efficiency of 6 Gflops per Watt.”

The computer, which runs the Linux-based Sunway Raise OS 2.0.5, uses the SW26010 processor, a chip designed by Shanghai High Performance IC Design Center. The Tianhe-2, by comparison, makes use of Intel’s Xeon E5-2692 processors and Xeon Phi 31S1P coprocessors.

Each processor has four computing groups (CGs), each of which has one management processing element (MPE) and an 8x8 array of computing processing elements (CPEs), making a total of 260 cores. There is also a network on chip (NoC) connected to a system interface. Each of the MPE, CPE, and MC have access to 8 GB of DDR3 memory.

“The total system has 40,960 nodes for a total of 10,649,600 cores and 1.31 PB of memory,” said Dongarra. “The MPEs and CPEs are based on a RISC architecture, 64-bit, SIMD, out of order microstructure.”

Sunway TaihuLight: Two nodes on a card
Sunway TaihuLight: Two nodes on a card – Top500

However, Dongarra did note that the new supercomputer had its drawbacks: “The HPCG performance at only 0.3 percent of peak performance shows the weakness of the Sunway TaihuLight architecture with slow memory and modest interconnect performance.

“The ratio of floating point operations per byte of data from memory on the SW26010 is 22.4 Flops(DP)/Byte transfer, which shows an imbalance or an overcapacity of floating point operations per data transfer from memory.

“By comparison the Intel Knights Landing processor with 7.2 Flops(DP)/Byte transfer. So for many “real” applications the performance on the TaihuLight will be nowhere near the peak performance rate. Also the primary memory for this system is on low side at 1.3 PB (Tianhe-2 has 1.4 PB and Titan has .71 PB).”

Applications that have already been used on the system include “a fully-implicit nonhydrostatic dynamic solver for cloud-resolving atmospheric simulation,” “a highly effective global surface wave numerical simulation with ultra-high resolution,” and a “large scale phase-field simulation for coarsening dynamics based on Cahn-Hilliard equation with degenerated mobility.”

Sunway TaihuLight: General architecture
Sunway TaihuLight: General architecture – Top500

The four key application domains for the Sunway TaihuLight are:

  • Advanced manufacturing: CFD, CAE applications
  • Earth system modeling and weather forecasting
  • Life science
  • Big data analytics

Cooling was provided by Mitsubishi-owned Climaveneta, who provided fifteen TECS2-W/H watercooled chillers equipped with magnetic levitation and oil free VFD compressors, with a Seasonal Energy Efficiency Ratio (ESEER) close to ten.

Funded by the Chinese state government, the province of Jiangsu, and the city of Wuxi, Sunway TaihuLight reportedly cost 1.8 billion RMBs (USD$270M), including building, hardware, R&D, and software costs.

Superpowers compete over supercomputers

In 2001, China did not have a single supercomputer listed by Top500. Now the country has 167 systems, beating the US’ 165 systems - the first time any nation has bested America on the list.

According to China’s official national plan, the country aims to develop an exascale computer (Tianhe-3) during the 13th Five-Year-Plan period (2016-2020). The US, by comparison, plans to reach exascale by 2023.

The US Department of Energy plans a 200 Pflops machine called ‘Summit’ for early 2018, a 150 Pflop/s machine called ‘Sierra’ by mid-2018, and a 180 Pflops machine called ‘Aurora’ for late 2018.

The US had hoped to slow China’s rapid HPC growth, last year blocking the sale of US exports to some Chinese supercomputers, including the sale of Intel Xeon processors and Xeon Phi coprocessors for an upgrade to the Tianhe-2.

At the time, Dongarra told Xinhua: “The U.S. government is trying to stop the spread of high performance computer systems in China. The ban will probably accelerate the development of a processor designed in China for use in high performance computers.”