Archived Content

The following content is from an older version of this website, and may not display correctly.

Is the supercomputing industry still evolving technologically, or are the major players consolidating once again? Monday’s release by Top500.org of its semi-annual list of the 500 highest-performing supercomputers on the Linpack benchmark, revealed no changes among the world’s top nine performers.

The real story comes from the entire list. Not everybody gets to own the top nine supercomputers. So among the rest of the world, supercomputing labs have to compete with cloud data centers for customers. Supercomputers are, after all, the modern generation of time-sharing systems, still representing the world’s first business model for the outsourcing of compute power. Cloud service providers are the upstarts, the wannabes, or what network engineers in the 1980s — taking a page from NNTP protocol — would call “alt.comp.sys.”

The supercomputer service industry is still innovating to stay on top. And it’s the rest of the list that tells their story.

The rise of Haswell
There are 27 systems on the November 2014 list built around the Haswell generation of Intel processors — the generation that culminated in the Xeon E5-2600 v3 series. This generation has only been officially available for a few months; the June list acknowledged just one system, then ranked #203. Six of the Haswell performers on the November list are upgrades from previous list, the rest are newcomers.

Ten of these Haswell systems are built by Cray, including the highest-performing Haswell in the list: #16, a 94,608-core XC-40 model, nicknamed Hornet, run by the Stuttgart High Performance Computing Center. Hornet was installed in Stuttgart by Cray last August 13, and officially entered production there just today. Cray uses its own custom interconnect fabric, called Aries, which premiered with its XC-30 in November 2012 (Big events in supercomputing tend to happen around June and November).

Berkeley Lab was one of the first to use the Aries fabric in a production environment, with the Edison supercomputer that made its debut on the June Top 500 list at #18. In an examination of Edison’s performance just months ago by Berkeley’s bosses at the US Department of Energy, researchers concluded that the Aries fabric scaled across the entire system at even levels.

This matters because when you try to scale an application to the furthest reaches of the system fabric, you don’t want to introduce latencies. They become like shockwaves when they echo through an all-to-all connection scheme.

DOE noticed some latencies due to scalability constraints with its previous Berkeley Lab supercomputer, Hopper. It was based around AMD Opteron processors, and made its first appearance on the Top 500 list four years ago at #5. It’s now #44. In its examination, DOE concluded that the latencies detected in Hopper were completely eliminated in Edison using the Aries fabric.

And the lessons learned with Edison were put to use on Hornet. Today, five XC-40 models already outperform Edison, four of which have just premiered on the Top 500.

Whose processors reign supreme?

For a while, when the hot topic in the supercomputing sphere was the rise of commercial, off-the-shelf (COTS) processors, it appeared that the competition in the HPC processor space would be the same one in the consumer space: Intel vs. AMD. While AMD Opteron remains a factor in this month’s Top 500, it’s a rusting one. Opterons still power the #2 performer, Oak Ridge Laboratory’s Cray X7 “Titan” boosted by Nvidia Tesla K20x GPU accelerators.

But only one AMD-powered model is new to the list this month: the #151 system, unnamed, built by HP. The other 27 date back as far as five years.

IBM Power technology (or PowerPC, for older readers) still features in the list, but there’s a story here too. Of the 39 Power-based systems this month, again only one is new: at #338, built for Dassault Aviation. BlueGene/Q models hold on to the #3, #5, #8, and #9 spots.

The story lurking beneath both of these apparent dormancies is actually the same one, and it’s not the one you’re thinking. Both AMD and IBM have one big thing in common: They both offloaded their chip manufacturing facilities to GlobalFoundries, with IBM just last month actually paying GF $1.5 billion over three years to take the chip plants off its hands.

AMD used to credit its chip manufacturing facilities — back when it owned them — with the stunning rise of Opteron at the beginning of the multicore era. Now that GF owns two sets of legendary fabricators, it may yet inject new blood from IBM’s Power intellectual property base, into future Top 500 lists.

So all this sounds like Intel walks away with the processor crown in a cakewalk. But think again: Nvidia GPU processors boosted the performance of 49 systems in this month’s list, including 17 of the top 100 and 3 of the top 10 (which, by the way, are all Crays). While Intel Xeon Phi coprocessors still boost the reigning champion, China’s Tianhe-2, they’re only found in 2 of the top 10 and 10 of the top 100.

With the second-generation Xeon Phi (“Knights Landing”) on its way next year, what had been co-processor architecture will move into the center seats of completely new systems. But Nvidia Tesla processors are making headway; and while Xeon E5 v3 processors will continue their rapid march forward, no doubt several of them will share the glory with Nvidia.

What’s your definition of “performance?”
The Linpack benchmark is a test of raw speed, and not really a test of engineering efficiency. Tianhe-2 cranks out an Rmax score of 33,862,700 gigaflops, which is still beyond belief. But it ranks #63 among Top 500 competitors in terms of efficiency in megaflops per watt, which is a metric that Top500.org began measuring just recently.

Tsubame-KFC, the oil immersion-cooled supercomputer(pictured) built for the Tokyo Institute of Technology, remains the efficiency leader on this month’s Top 500. It actually increased its efficiency score over last June, scoring 4,272.04 Mflops/W versus 3,418.18 Mflops/W five months ago (Tsubame is now the #392 performer overall).

But three systems on this month’s list beat Tsubame’s old score. Ranking second in efficiency is a system built by Cray for Cray: a 3,080-core, Xeon E5 v2-based system that’s now the #361 performer overall, beating Tsubame to the line. Cray’s aptly-named “Storm1” scored 3,878.61 Mflops/W in its first test.


There was a time just a few years ago when the name “Cray” had almost disappeared from the Top 500. Now it’s mentioned in conversations more often than “BlueGene.” Arguably the greatest name in supercomputing history is today wearing more than one crown, and deservedly so.