When it comes to high asset utilization and agility in the data center, several enablers stand out: composability, scalability, and openness. Composability means you can precisely match hardware to a particular type of workload, down to the specific hardware components (e.g., CPUs, memory, storage, FPGAs, NVMe modules, co-processors, network connectivity, etc.). High scalability means you can use as many components as you need — even if dispersed across physical racks — to assemble, on the fly, the compute power needed to run any size of workload. Openness means you can select and integrate components based solely on those best suited to your workload without artificial compatibility issues.
Composable disaggregated infrastructure (CDI) is an architectural approach designed to provide granular hardware composability, high scalability and open management APIs. CDI works with virtualization and software defined infrastructure (SDI) to improve data center efficiency and flexibility by overcoming the limitations imposed by servers with fixed ratios of compute, memory, storage, accelerator and networking resources.
CDI is a critical piece of the puzzle needed to meet hyperscale data center requirements. So what is it about today’s data center challenges that drives the need for CDI?
Challenges for data centers
Some of the trends that are creating a need for more dynamic hardware configurability include:
Hyper growth and hyperscale — new computing paradigms, like cloud and edge computing, are driving service providers to scale their data centers so fast that traditional deployment and management methods can’t keep up.
Hyper density — demand for more compute and storage capacity means data center operators are working harder to get more computing done with budget spent on equipment, cooling, floor space, and electricity.
New Workloads — when it comes to new types of workloads — Big Data, Internet of Things (IoT), artificial intelligence, and others — data centers are stretched by the sheer scale of these workloads. In addition, these applications often exhibit large variations in demand and rapid growth over time.
DevOps and microservices — in the past, most applications were static and monolithic running on one machine. Contrast that with today’s applications, which are composed of interconnected software components that are physically dispersed, continuously upgraded, and optimized to be dynamically scaled. Hardware must be able to scale as well.
New hardware technologies — along with new types of applications has come a wide assortment of supporting hardware — different types of processors, storage, and interconnects — making any fixed “one-size-fits-all” hardware inefficient and inflexible.
Today’s data centers are increasingly called upon to run much larger, more complex workloads that are often very different from one another — so the hardware requirements to run them may vary widely from workload-to-workload, and may also change over the course of a day or even an hour. For example, some workloads might need more, and some less, processing or memory capacity. Still others might require NVMe storage or special purpose processors. Furthermore, to lower TCO it might also be desirable to leverage higher-end devices across multiple workloads at different times.
How to tell if your data center is under stress
What are the tangible impacts of these new challenges? How do you know if your data center is being affected by them? Some of the practical indicators of stress on the data center include:
- Data center management is still complex and requires a large technical staff.
- Even virtualized environments rarely exceed 50 percent average utilization, and non-virtualized data centers run at around 20-30 percent.
- Provisioning hardware for new applications still takes days or weeks and requires multiple specialists.
- The Intelligent Platform Management Interface (IPMI) is 20 years old and has inherent limitations due to its protocol and bit-level encoding technique. Data centers need a management standard that is more scalable, secure, and internet friendly.
- Interoperability across equipment and management software from different vendors is often problematical, limiting functionality and programmability.
- CPU upgrades usually require replacement of an entire server chassis and all the resources in the server, retiring storage, power supplies, fans, and network adapters sooner than necessary.
- Application developers are slowed by the current requisition, deployment, validation, provisioning processes.
- Responses to unforeseen changes in application capacity requirements are too slow and labor intensive.
All these challenges have a common source: the inability of data center operators to easily assign, with high granularity and scale, those particular hardware elements that best fit specific workloads (whether individually or as a group) as hardware requirements keep changing.
The limits of virtualization and SDI
Virtual machines (VMs) allow multiple applications to run on a server, helping to better utilize the server’s hardware, enabling rapid provisioning and load balancing, and increasing management automation. Containerization also provides many of these benefits as it enables applications to be packaged with all their dependencies and deployed dynamically to servers in response to workload variations — further increasing hardware utilization and flexibility.
SDI extends the idea of hardware abstraction to encompass other infrastructure elements in addition to compute servers, including file servers, storage clusters, and network switches — so the whole data center infrastructure becomes like software — i.e., programmable like the operating environments and applications that run on top of it. What’s still missing is the ability to configure the elements within servers (i.e., to assemble specific hardware resources on demand) from anywhere in the data center. Composable disaggregated infrastructure (CDI) provides those missing pieces.
CDI virtues
In a CDI-enabled data center the individual compute modules, non-volatile memory, accelerators, storage, etc. within each server are disaggregated into pools of shared resources — so they can be managed individually and collectively under software control. The disaggregated components can then be reconstituted, or composed, under software control, as workload-optimized servers irrespective of which racks the components happen to physically reside in. Studies show CDI can achieve TCO gains of up to 63 percent (55 percent capex, 75 percent opex) and technology refresh savings of 44 percent capex and 77 percent labor.
These savings result from:
- Faster, easier scale out resulting from disaggregation, common management APIs and vendor interoperability,
- Greater agility in application development, provisioning, and lifecycle management,
- Higher efficiency due to better resource utilization, reduced overprovisioning and dynamic workload tuning,
- Independent upgrade cycles (i.e., only the targeted resource need be replaced, not a whole server) for more capacity per dollar spent,
- Optimized performance through custom configurations including fast non-volatile memory (NVM) and accelerators, and
- More automated infrastructure management and more efficient use of staff.
Facebook, Google, and other tier one Cloud Service Providers (CSPs) are actively investigating disaggregated architectures for their data centers. Some of their implementations are custom and most use proprietary software and APIs. To match the gains of the largest CSPs, organizations that don’t have the scale of a Google or Facebook need commercial off-the-shelf CDI solutions that anyone can adopt. The commonality of open technology standards will help the industry to achieve scale and make CDI generally available from a choice of suppliers.
An open CDI blueprint
That’s the goal of Intel Rack Scale Design (Intel RSD), a blueprint for unleashing industry innovation around a common CDI-based data center architecture. Intel RSD is an implementation specification enabling interoperability across hardware and software vendors.
Intel RSD defines key aspects of a logical architecture to implement CDI. The first is a design specification that defines the hardware and software capabilities needed at the module, rack, and data center levels to enable granular composability and scalable software control of the infrastructure. The second is a common set of open APIs that expose those capabilities to higher-level orchestration software from multiple open source or commercial suppliers.
These APIs are defined within Redfish, an open, scalable and secure standard that replaces IPMI with a modern, open management framework based on web-friendly principles (RESTful APIs, JSON data model). Redfish is the product of the Distributed Management Task Force (DMTF) Scalable Platforms Management Forum — an industry initiative launched in September 2014 by Broadcom, Dell, Ericsson, Hewlett-Packard, Intel, Lenovo, Microsoft, Supermicro and VMWare. Intel RSD extensions are regularly submitted to the Redfish Scalable Platforms Management Forum as proposals for inclusion in the official Redfish standard.
Intel® Rack Scale Design specification documents are available at intel.com/intelrsd:
The Intel Rack Scale Design Platform Hardware Guide covers general specifications and design guidelines for platform hardware and components.
The Intel Rack Scale Design Software documents cover the functionality and APIs for the various components of Intel RSD software.
Commercial products based on the Intel RSD specification are available today from many suppliers, including Dell EMC, Ericsson, HPE, Huawei, Inspur, Quanta, Supermicro, Wiwynn and others. All share a common goal: to finally unleash the full power of the data center at the most granular level possible and to scale that power across all workloads using a common set of open management APIs. Commercial CDI products will help our industry meet the challenges that today’s data centers operators face every day.
Steve Gillaspy is senior director of Rack Scale Design Product Management at Intel