The keyword in the above headline is “could”. The multicore era was not, you may recall, first heralded by Intel. While Intel was basing its data center processing plan on an architecture called Itanium (which had just been dragged, kicking and screaming, into the 64-bit era), AMD made the first inroads towards producing a dual-core, 64-bit, x86 processor with the performance levels that data centers required. Intel was dragged – not kicking and screaming, but certainly dragged – into producing x86 (x64) as its leading processor line for servers.
But because Intel joined the competition, the complexion of data centers changed, mainly due to one aspect of multicore architecture that even today, a decade later, IT departments don’t quite understand, let alone appreciate: Multicore forced process parallelism into the datacenter – yes, kicking and screaming.
The back door to parallelism
Parallelism is the ability for a set of processors and/or cores in a system to run multiple, coordinated threads simultaneously. How a process gets separated into two or more threads is the exact opposite of magic. With the Itanium architecture that preceded x86/x64 into multicore, threads divide explicitly, meaning the process decides when divisions happen. These split points are milestones that are entered into the process code by the compiler.
Parallelism in x86/x64 is largely implicit, meaning the processor takes advantage of opportunities ascertained through an analysis of the code, to determine which parts of processors can be executed in parallel. When Intel created hyperthreading (HT) – a way for one core to run two threads in alternation – its effectiveness was dependent upon software having been compiled in such a way that the processor could make good judgment calls.
And as I reported a decade ago, some processes actually ran more slowly on hyperthreaded processors than on single-threaded ones at the same clock speed. Here’s how I presented the topic in 2005 for the original Tom’s Hardware Guide: “HT is actually one of the many efficiencies that Intel engineers discovered in a way to combat the perennial problem of latency—the fact that single-core processors can spend more than half their clock cycles executing nothing at all.
In the mid-1990s, discovering that traditional single-threaded programs were utilizing only about 35% of a processor’s available resources, Intel realized that, with an added level of hardware sophistication, it could schedule at least one more thread of execution within the unused space. It could only achieve this, however, if each thread “believed” it had the entire processor to itself—including its registers, cache, and access to the front-side bus. So HT schedules the execution of two threads (for now) in alternating fashion—a few cycles for thread #1, a few for thread #2—but only when the results of one thread do not corrupt the image of the CPU for the other. In other words, HT only alternates instructions that cannot get in each other’s way.
The immediate benefit of HT parallelism is that it doesn’t require the software—the programs which constitute each thread—to be aware of any parallelism taking place whatsoever. Each thread, not “knowing” it runs in a split environment, “believes” to have the processor all to itself. As a result, software originally compiled to run on standard single-core processors… need not be recompiled in order to implement parallelism, and to realize at least some boost in performance without involving raising clock speed.”
Four years ago, we saw the proliferation of eight-core processors in the data center, and the dawn of 12-core. But was this necessarily the path of progress? Because most parallelism in use today is implicit, eight cores do not run a process eight times better than one core. As many engineers have told me over the last decade, the net gain from each new core placed onto the stack is exponentially lesser than the previous one. And some told me that the ‘point of no return’ – of adding processor cores that essentially added no real power to the operating system – would be at 10 cores.
Today, we’re approaching the onset of 16-core processors. If there were no real gains to be made after 10 cores, we’d know it already.
The end of the server box
Ever since the onset of the multicore era in 2005, and especially in just the last four years, there have been seismic shifts in software:
* Virtualization has changed the meaning of “process” in a system, and the operating system that runs “user applications” (Linux, Windows Server, Unix) no longer runs as close to the processor as in the previous decade. With the operating system now a “higher-level process,” hypervisors now occupy the lower-level space. This changes the game for multicore, since hypervisors are more homogenous in nature and easier to predict, especially when a company like Intel gives hypervisor makers like VMware and Citrix special processor resources (namely, Intel VT) which they can utilize directly, like an exclusive connection.
- * Cloud platforms have radically altered what it means to be a server. Today, a server is a contributor of real estate and resources to a broader processing system. Compute power, storage capacity, memory, and network bandwidth are all becoming singular variables in a very broad formula. No longer is the processor the central focus of all software development, but one wheel in a colossal machine.
* ‘Big data’ has leveraged the versatility of cloud platforms to pool memory from multiple systems into a single field of operation. In so doing, it has managed to join virtualization in replacing user-level operating systems with much more fundamental kernels like Hadoop – designed specifically to run a handful of specialized analytical tasks. Today, not only can these tasks be optimized to run on certain processors including Intel’s, but conceivably Intel processors can be optimized to run them. In a world where the number of items of software a processor handles is more limited, not only the possibilities but the incentives increase for tailoring processors to match the workload.
* Software-defined networking (SDN) enables logic to intelligently define how a network is configured, and uses CPU-based logic to manage the flow of data. It may be the most rapidly developing technology in the history of the back-office. SDN gives manufacturers like Intel the incentive to tailor their CPUs with functions that enable original equipment manufacturers to leverage them to serve as network masters.
Today, there are justifications for 16-core Xeon (and other manufacturers’) CPUs that we didn’t have in 2008 or even 2010. There are market forces at play here: With PC market growth stalled, and by some measures declining, CPU makers need to find other high-volume sales outlets to keep production costs low. The rapid expansion of cloud datacenters may provide one such outlet, but such facilities typically do not require the highest performance CPUs.
Intel’s current generation of processor architecture is called Haswell. Introduced in 2012, it was designed to move more of the control features that typically appear in a chipset – the separate parts that manage a motherboard, such as voltage control – onto the CPU die. According to the company’s public roadmap for the first half of the year, published last January, its Efficient Performance Server Platform is due for a refresh sometime during the latter half of this year.
So is it time for Intel to endow the Haswell generation of Xeon and Xeon Phi (the high-performance line originally made up of co-processors) with more workload-tailored functionality? This isn’t one of those questions I actually know the answer to already, but am treating as a secret just to make myself sound prescient. If datacenters are truly to replace desktops as the centers of x86/x64 processor architecture, then Intel has to consider the server CPU in the context of what it makes the datacenter become, in the same way it used to consider the PC CPU in the context of what it makes the office become.
I have these questions for Intel this week. All during Datacenter Dynamics’ coverage from Hillsboro, Oregon, I’ll be re-examining these questions to see how well and how deeply they’ve been answered:
Is Intel preparing to facilitate a market of purpose-built server racks – not just single servers or blade clusters, but mechanisms designed to separate compute from storage from memory from power from cooling?
Who is Intel’s most influential partner for Xeon technology: high-performance brands we’ve come to know like HP, Lenovo, and more often, Oracle? Communications companies like Cisco and Alcatel-Lucent? Or the ODMs that we’ve come to know for building smartphones, and who may be building more cloud datacenter servers, such as Quanta Computer?
What new form factors could such purpose-built servers take, that could radically reform the architecture of datacenters themselves in just the next four years’ time (when Intel would need to begin rethinking the question all over again for its next generation)? This and other datacenter publications spend quite a few pages tracking the many corporations that build new datacenters, and list how many square feet they’ll consume. But these habits of ours will start looking antiquated if the design of server racks evolves into something more than the product of a multiplication equation.
What will be the definition of “server” by 2018, and will the fundamental characteristics of datacenter design be impacted by any change to it? Intel is faced with market pressures to compete now against ARM, which is moving a 64-bit version of the technology it created for small and embedded devices, into the datacenter. While we used to reserve the term “bare metal server” for a basic, generic component, it’s becoming something of a kit that can, like an ARM-based embedded device, be built to order. When Intel decides to compete in a market space, that decision changes the space.
If we are at the crossroads of a revolution in data center design, then we had better start fathoming the depths of this change today, and isolate and identify the causes. If not, then we could do without all the drumrolls and fanfare. Our job this week will be to gauge whether the changes ahead of us are monumental or merely incremental. Stay tuned.