Intel shows 3rd-gen, 10 nm HPC Xeon Phi

Archived Content

The following content is from an older version of this website, and may not display correctly.

In advance of this week’s Supercomputing ’14 conference in New Orleans, Intel will formally unveil the third stage of its Xeon Phi processor roadmap for high-performance computing devices. The second stage of the “Knights” series, the Knights Landing product generation, remains on track for commercial availability in the second half of 2015. From what an Intel executive told a press conference attended by DatacenterDynamics, the plan is for there to be no surprises.

Intel vice president and general manager of HPC for its Data Center Group, Charlie Wuischpard, told reporters Knights Hill will continue the onward march of Intel’s highly-parallel version of Xeon from the starting point (Knights Corner) through its beachhead (Knights Landing), and onward. It doesn’t take a mathematician or a poet to tell you there’s a greater goal in the distance, a looming conquest.

Not a scale, but a path

“Parts of these are overlapping development projects, so we have to start work on one before the other’s even really launched,” Wuischpard conceded. “But we want to announce Knights Hill as the third-generation Intel Xeon Phi. It’ll incorporate our second-generation Omni-Path fabric architecture, and it’ll use our 10 nm process technology.”~

You’re probably thinking, wait a minute, there wasn’t a first generation Omni-Path fabric architecture. Up to this point, we had been calling Intel’s alternative to InfiniBand fabric “Omni Scale,” but Wuischpard told reporters that Intel came to the conclusion that name didn’t really fit. If you think about it, “scale” can increase in more than one dimension. “Path” implies a single vector, one unambiguous (or less ambiguous) direction — which brings us back to the whole “Corner/Landing/Hill” metaphor.

“One of the things I think our competition likes to critique us on is, are we going to sustain this investment over a period of time?” remarked Wuischpard. “When we’re asking our customers to make big investments in code modernization, we’ve got to demonstrate that this is not just a one- or two-generation investment, but really a multi-generational investment.”

Across all aspects of its Xeon product line, Intel is applying much heavier emphasis to the notion of customer involvement in the development process. It stops short of direct participation in fabrication, but it gives high-quantity buyers a voice in what the Xeon package will contain. For example, Amazon has taken customized Xeons for its EC2 service, saying the service keeps Intel ahead of the competition.

With the upcoming Knights Landing generation, Xeon Phi will be commercially available as a stand-alone CPU, not just the co-processor PCI package currently available for Knights Corner.

Intel’s product cycles remain fairly well-synchronized — while they oscillate from time to time, they hover right around 18-month increments. That’s not long enough, however, for customers who are contributing code and features to the Xeon Phi package. They don’t expect supercomputers to be expendable after 18 months’ time like iPhones. So Intel now emphasizes the path, the single direction for upward mobility, for customers for whom Intel’s celebrated “tick-tock” cadence runs somewhat fast.

How workload-optimized is it?
But this may also put Intel in a conundrum: As its software engineers will attest, HPC workloads and enterprise workloads have very different profiles. HPC software requires very reliable, predictable processor cores, for what engineers call a high degree of determinism.

This is necessary for software to enable parallelism by design. Ironically, the way Xeon enterprise and business-class processors (and Core i3, i5, and i7, for that matter) manage parallelism internally renders them less deterministic — it’s harder for software to reliably predict the behavior of the processor. Most of the time, though, word processors and photo retouching tools don’t need that level of prediction anyway.

Meanwhile, HPC processors have to account for power and performance variables pursuant to very different system configurations — a server rack cooling apparatus, for example, as opposed to a 25 mm case fan.

So there are clearly two classes of workloads, as different from one another as Olympic fencing from Olympic curling. As long as Intel opens up all of its Xeon product line to customer input, and the results are — as the company says they are — more workload-centric product lines, won’t this lead to an effective split, where Xeon and Xeon Phi no longer resemble one another? Doesn’t this represent the end of the Intel development motif, where innovations start at the high-performance end and trickle down toward the mid-range? We put the question to Intel’s Charlie Wuischpard.

“I think we certainly believe in the trickle-down theory,” he responded. “If I were to generalize about the developer community, there’s a lack of specific skill in writing applications optimized for parallelism. I think we see the start of that certainly at the top end. The Knights family, the Xeon Phi family, is the ultimate instantiation of parallelism — many more cores, many more threads. Of course, cores and threads are advancing with Xeon, and Xeon Phi being one step ahead. But... at some point, you’re going to be using an architecture that has many cores and many threads. So we believe it will trickle down, and that’s why we’re making these investments in the top end.”

In fact, Wuischpard continued, some of the quickly evolving consumer-end applications are actually trickling up toward the HPC end. Rendering video is one example. Only in the last few years has an industry truly developed around the ages-old idea of rendering farms — offloading high-resolution movie graphics rendering operations onto highly parallel clusters.

In a completely different industry, various big data-oriented analytics tasks are now being offloaded in a similar vein. It’s making iPads into dashboards, connected directly to highly compute-intensive processes on the back end.

“So I don’t think we see a divergence,” the Intel VP remarked. The vector-oriented processing capabilities may make Xeon Phi utilized in a different way from standard Xeon, he later added, but “one of the fundamental premises is that there’s this consistent programming model that works across the board. I look at Knights Corner, to some degree, as being the canary in the coal mine. If you just install it and hope everything’s going to run great, and you haven’t done the parallelization or haven’t used the vector capability, you shouldn’t expect to get great performance.”

Using our calculator as a time machine

While Intel offered plenty of vision and metaphor surrounding Knights Hill, it’s short on specifications for now. The existing code name for Intel’s 10 nm lithography is “Cannondale,” and because we’re now on the cusp of single-digit nanometer scale, there’s still considerable fog obscuring our view — and perhaps Intel’s as well — into the makeup of 10 nm processor cores. It’s been generally reported that ultraviolet lithography techniques will not be used until perhaps the 7 nm process comes online.

But if Intel’s current scale... excuse me, current path, is as straight a line as it promises, we could do some simple math to make a projection for Knights Hill. Knights Corner accomplishes 1 teraflop of double-precision floating-point performance with a 61-core die executing 16 operations per clock at 1.23 GHz frequency. For Knights Landing to accomplish its projected 3 teraflops, let alone surpass that mark, using the standard formula, it would have to execute the absurd value of 32 operations per clock at 1.25 GHz. Intel hasn’t verified this figure, but it’s the only way its math works out.

For Xeon Phi to stay within the power envelope, Intel must keep Knights Hill’s clock speed low. Assuming Intel stays the course with 32 FLOPS per clock and can’t repeat that thus-far-unconfirmed feat, Knights Hill could approach the 4 TFLOPS landmark at 100 cores clocked at a frequency no faster than 1.25 GHz. That’s assuming, of course, no surprises.

Intel will hold its own supercomputing developers’ event in conjunction with this week’s SC ’14 conference. There, Wuischpard tells us, Intel will be courting more long-term developers to contribute their expertise to Xeon Phi.

Intel shows 3rd-gen, 10 nm HPC Xeon Phi

Archived Content

Unlocking data center profitability: A guide to DCIM solutions

The make vs. buy decision for data center infrastructure management software – A clear choice

2023 Data Center Market Trends: Hong Kong Asia's Connectivity Hub

Emerging Energy Storage Technologies