At a press demonstration this week at its Jones Farm, Oregon campus, Intel engineers proudly demonstrated an internally built “Adams Pass” software development server sled based on just one of its forthcoming Knights Landing series of Xeon Phi processors, managing a simultaneous and even-flowing workload of 240 threads.
When you do the math, that’s 60 cores per chip (Intel is promising more than this), with four-way hyperthreading. Although the appearance of the sled server’s heat sink can fool you, that’s just one big chip under there — the largest die that Intel has ever produced for a CPU.
Pretty Knights all in a row
As Knights Landing chief architect Avinash Sodani confirmed, all threads were being run on a single image of a single physical (non-virtual) operating system. Had this image been Windows, said Sodani, the operating system would already have been able to manage all 240 threads just as evenly, without extensions to the operating system.
This means, with Knights Landing, there won’t be a “main core” backed up by 60 or more subsidiaries which the operating system activates when it gets around to it. “It’ll look like everyone is at the same level,” he said. “There’s no master/slave kind of relationship.”
Intel was not ready at press time to disclose just which operations were running on those threads, though it did say they were independent calculations and not just strings of “no-ops.” Our access to its laboratories was quite secretive, and reporters were even asked not to look in certain directions.
That said, this reporter still asked one of the engineers on-site what Intel is asked to demonstrate, when it’s showing off Knights Landing to people other than the press — say, to prospective OEMs.
They want to see ease of programmability, the 15-year veteran engineer told DatacenterDynamics, and ease of workload portability. Could their apps, compiled for a different class of processor (probably Xeon) execute with just as much ease on Xeon Phi? Some of these prospects are specifically interested in high-performance workloads, though he admitted, not everyone wants to see Xeon Phi executing HPC exclusively.
Meshes and tiles
The demonstration was part of Intel’s stepped-up efforts to promote its forthcoming Knights Landing Xeon Phi as a stand-alone processor, as well as a co-processor package like the current-day Knights Corner series.
Knights Landing truly is a network on a chip. Not only does it maintain its own 48-port 100-gigabit network switches on-die, its microarchitecture uses a hub-based network as well. As Sodani demonstrated, unlike its Xeon cousin, Xeon Phi pairs cores together into tiles. Cores on each tile share a single 1-megabyte L2 cache, and each core has access to two vector processing units, enabling the class of parallelism originally created for graphics processors.
Over 30 of these tiles (the exact numbers have yet to be made official) are linked together not by the twin-ring architecture introduced by Intel in its Xeon E5 v3 series, but instead by a 2D mesh architecture. Network connections cross each tile along its Y and X axes, and signals are sent between tiles along these connections.
It’s a strange routing system, but it works: As with any network, the hub acts as the control point for routing. Whenever a packet is sent from one tile to another, it takes a direction in the Y axis first. It then completely traverses the Y axis, said Sodani, before making a single 90-degree turn to traverse the X axis.
This counter-intuitive method of routing actually minimizes contention, even when it doesn’t minimize the number of “hops” between tiles.
Atoms smash Xeons
Although this internal network is worlds different from the interconnection system in Intel’s other Xeon processors, Sodani stated that Intel is actively refraining from making Xeon Phi too different, lest it become incompatible with its cousin.
“One of the things we’re doing with Knights Landing, now that it’s in a bootable state, we want to make sure the existing legacy software has a smooth transition from where it’s running to this. A lot of things we are making are similar to Xeon, as long as we don’t have any reason to be different.”
But they’re not Xeon “Broadwell” cores inside Xeon Phi. Instead, their design is based upon the “Silvermont” cores originally created for Atom processors. Sodani told reporters his team chose Silvermont in order to achieve better power/performance density, but made so many changes to the package that the company should perhaps consider re-dubbing it “Knights Core.” “This allows us to actually put two vector processing units in it, each one being 512-bit times two,” he said.
With 60-plus cores in the final package, “that kind of power density is hard to get even with [Broadwell]. This is more efficient.”