New accelerators for servers, workstations, and edge devices are now available in the market and span a wide range of application requirements for AI and HPC. The choices and performance continue to increase. While the accelerator vendors release new accelerators, the landscape is changing with the latest offerings, an integrated CPU, and a GPU.

AI, HPC, and several other applications share some common requirements, yet choosing the proper type of accelerator and the system architecture can significantly affect the server's performance and pricing.

When installed in today's servers, accelerators can be categorized into three distinct groups. The typical (multiple system vendors, multiple accelerator vendors) options consist of:

Accelerators integrated together: Each accelerator has a high-speed connection to other accelerators

Accelerators utilizing PCIe interconnects: CPUs communicate with accelerators over PCIe, and the accelerators have no direct connection to other accelerators

Accelerators utilizing PCIe interconnects and small integrated: Each accelerator may communicate with one other accelerator over a direct connection, but the primary communication method is over PCIe

Accelerators integrated with a CPU: A CPU is tightly coupled to an accelerator over a high-speed connection. Only one CPU and one accelerator can have this tight connection. However, systems can still accommodate one or more PCIe accelerators, depending on the design

Accelerator Architectures for Applications
Accelerator architectures for applications – Supermicro

Large and medium-scale AI Training is based on digesting tremendous amounts of data and progressively determining an answer. Each accelerator must communicate with other accelerators at high speeds to analyze large amounts of unstructured data.

AI Inferencing is the decision part of AI. Once a model has been trained, a new data set can be sent to the system, and an answer or recommendation will be made. AI inferencing is a relatively fast computation calculation, where a server with many independent accelerators works best.

HPC applications can be accelerated by sending intensive computing tasks to the accelerators to perform the calculations for certain application parts. Parts of the application that can be done in many parallel computing elements will show significant speedups when using accelerators. In many HPC applications, a set of data worked on by the accelerator is relatively independent of the other work done by a different accelerator.

Virtual desktop interfaces (VDIs) allow a central server to control the desktop on client devices. An accelerator accelerates the computation and delivery of the pixels to the client devices. The PCIe model excels at this workload since each client device is typically independent of other client devices.

Visualization refers to rendering to the screen. Accelerators are excellent for speeding up the rendering process, as many rendering computations can be done in parallel. Visualization applications can utilize several accelerators but do not need the accelerators to communicate with each other.

Content Delivery requires acceleration to compress videos that are sent to client devices in many formats, with a specific SLA to the client device. Accelerators can handle multiple streams simultaneously and operate independently so that a PCIe solution can be used.

1. Accelerators integrated together

The highest-performing architecture for AI performance is a system that allows the accelerators to communicate with each other without having to communicate back to the CPU. This type of system requires that the accelerators be mounted on their own baseboard with a high-speed switch on the baseboard itself. The initial communication that initializes the application that runs on the accelerators is over a PCIe path. When completed, the results are then also sent back to the CPU over PCIe. The CPU-to-accelerator communication should be limited, allowing the accelerators to communicate with each other over high-speed paths.

A request from one accelerator is made directly or through a non-blocking switch (4 of them) and sent to the appropriate GPU. The performance of GPU to GPU is significantly higher than using the PCIe path, which allows for applications to use more than one GPU for an application without the need to interact with the CPU over the relatively slow PCIe lanes.

Major advantages:

  • Very large models that require a lot of memory to store the parameters can be trained. Each accelerator can quickly acquire and operate on data that may reside on a different accelerator.
  • Each accelerator can communicate with other accelerators directly.
  • Current systems that can scale out (after the server scale-up) are being built. The accelerators in one server will be able to communicate and share data with the accelerators that reside in another server.

2. Accelerators using the PCIe interconnect

A common and well-defined interface between CPUs and accelerators is to communicate over PCIe lanes. This architecture allows for various configurations in the server and the number of accelerators. Small servers can accommodate up to four devices, while larger servers can house up to 10 accelerators. Different server architectures can use the direct mode, single root, or dual root configurations.

A hybrid approach incorporating the benefits of the PCIe architecture and the integrated accelerators is also possible, where two PCIe boards are tightly linked.

Major advantages:

  • Expandable within a server from one to 10 GPUs
  • Assignment of independent tasks to the number of accelerators.
  • Performance can increase as additional accelerators are added to a base configuration.
  • Direct, Single Root, and Dual Root configurations for different workloads

3. Integrated GPUs and CPUs

A new architecture that integrates a CPU and an accelerator is now available and will enable new applications that rely on high-performance connections between a CPU and a GPU. The performance between the CPU and GPU approaches one TB per second, which enables a close coupling of what the CPU is designed for and the GPU is designed for. In addition, a shared memory space between the CPU and GPU can enable a new class of applications without waiting for the relatively slow PCIe communication network.

Major advantages:

  • The CPU and accelerator portions of the application can share memory, increasing performance.
  • Simplified programming model for memory spaces
  • Very fast communication between CPU and GPU

Learn more about Supermicro GPU servers here.