Microsoft has donated a broad set of technology to the Open Compute Project's open source data center stack, at this year's enitrely virtual OCP Summit.
The technology, rounded up in a blog post from Azure hardware leader Kushagra Vaid, includes output from the open source Cerberus security project announced back in 2017, standards for AI accelerators, and contributions to liquid cooling. Meanwhile, Vaid has taken a seat on the OCP board, announced at the OCP summit, which was moved online because of the Covid-19 pandemic.
Release the guard dog
"We believe data privacy and security are fundamental to building and maintaining trust in the cloud," says Vaid. Microsoft's Project Cerberus, revealed in 2017 (at a DCD event) aims to create hardware-based security, something Google also proposed, but to do it with an open source approach that enables others to join and share. The project now includes partners including Intel, AMD, Broadcom, Nuvoton, Mellanox, and NXP, says Vaid, and the architecture and firmware are now being open sourced, with Cerberus-based products coming this year..
OCP's efforts to create a modular building block architecture (MBA) are producing infrastructure and modules in multiple directions, says Vaid: "MBA clearly defines interfaces and physical boundaries for independent development and contributions through three stages: 1) base specification for comprehensive architectural definition, 2) design specification, including design implementation and collateral, and 3) product contribution."
For instance, Microsoft, Facebok and Baidu contributed a specification for an OCP Accelerator Infrastructure, which would allow a standard interface for add-in accelerators for jobs like AI and HPC. Through a joint development agreement, this has led to where OCP accelerator modules (OAMs), from accelerator firms such as Intel/Habana, AMD, nVidia and Xilinx.
Microsoft is also involved in a data center secure control module (CD -SCM) project, with its own interface, DC-SCI: "As the 'heart of the motherboard,' DC-SCM includes all essential elements of a server motherboard excluding the CPU, memory slots, and IO slots. It includes BMC, RoT, system, and BMC Flash, as well as other ancillary components required to deliver a data center-compatible, secure, control module," explains Vaid. .
The approach is a “win-win” opportunity for server suppliers, and data center builders, because it divides their roles and responsibilities clearly, he says, so OCP partners can easily build in servers from AMD, Intel, and ARM64 suppliers, while choosing their own solutions for the other components.
Vaid also predicts liquid cooling - and immersion in particular - will be coming to the fore, and could allow a novel continuation of Moore's law - the regular doubling of processing power: "While many are arguing that we are reaching the limits of Moore’s law, we believe that Moore’s law can be applied to halving the cost every two years for the whole data center campus—not just the chip," he says. "We see liquid cooling and—in particular, immersion cooling—are enabling some new architectures that we have not even begun to consider."
Liquid cooling is ripe for OCP-based standardization he says, because the liquid cooling systems used in supercomputers and bitcoin mines are not standard, not open and not serviceable: "Microsoft is collaborating with the OCP ecosystem, particularly Facebook and CoolIT, to establish standards for developing blind-mate Cold-Plate solutions for both Project Olympus systems and Open Rack v3."