The Azure cloud uses Linux to handle data center networking - in a specialized software component developed by Microsoft to run conectivity within its cloud operating system, along lines set out by the Open Compute Project.
Given Microsoft’s historic reluctance to support open source, and its efforts to promote Windows as the world’s default operating system, there was widespread surprise when a Microsoft Azure blog last week revealed that Microsoft had built its own Azure Cloud Switch (ACS) software for data center networking - and based it on Linux.
Managing multi-vendor switches
The switch is designed to provide a “cloud-wide network management platform”, which can manage switches from multiple vendors, despite the radically different software used on each one, according to Kamala Subramaniam, principal architect for Azure networking.
“Both the cloud and the enterprise depend on high-speed, highly available networks to power their services,” said Subramaniam. ”This makes it critical for network operators to be able to control their own destiny by rapidly adding to their network features they need while keeping out feature changes that increase risk and complexity.”
Network switch hardware keeps improving, getting cheaper and faster, thanks to competition, she said, but the different software on each one is a barrier to integrating the best-of-breed switches, and keeping features available when a new switch enters the data center: ”Ideally, we would like all the benefits of the features we have implemented and the bugs we have fixed to stay with us, even as we ride the tide of newer switch hardware innovation.”
The ACS is a cross-platform modular operating system for data center networking built on Linux, which can quickly debug, fix, and test software on various switches, and gives Microsoft the flexibility to develop the features it needs, while sharing the same software stack across hardware from multiple switch vendors.
The system uses a standard API called the Switch Abstraction Interface (SAI), which has emerged from the Open Compute Project (OCP), and is allows programming of network switching ASICs. Microsoft helped found the SAI effort and has been a leading contributor to it. “We view SAI as an instrumental piece to make the ACS a success,” said Subramaniam.
A bit deviant
The ACS “deviates in many aspects from conventional switch software”, she explained. It uses a “lean” stack tuned to s single customer - Microsoft - and a single purpose - data center networks. It is also modular, instead of the monolithic stacks used by switch vendor software. These two factors make the software easier to validate, and avoids unexpected bugs, so things can be implemented more quickly.
The ACS also includes a monitoring and diagnostics system from Microsoft, for easier configuration and management, and allows the switches to be managed like servers, with weekly software rollouts and roll backs.
“ACS believes in the power of Open Networking,” said Subramaniam, ”ACS together with the open, standardized SAI interface allows us to exploit new hardware faster and enables us to ride the tide of ASIC innovation while simultaneously being able to operate on multiple platforms.”
And the Linux base is fundamental to this, she said, because its “vibrant ecosystem” allows the system to use and extend applications from the open source community such as Quagga, as well as apps from Microsoft like Autopilot or Swan, as well as code from third parties.
The system includes a modular database layer, within Switch State Service (SSS), which helps in object sharing among different applications.
The whole thing was demonstrated at the SigComm conference in August 2015, with support form four ASIC vendors - Mellanox, Broadcom, Cavium, and the Barefoot software switch. It also worked with six implementations of SAI, from Broadcom, Dell, Mellanox, Cavium, Barefoot, and Metaswitch), and three applications stacks (Microsoft, Dell, and Metaswitch).
In the demonstration one software application spoke to the various ASICs, while the ACS also interworked with Dell’s and Metaswitch’s own application stacks, using a “clos” topology that Microsoft uses in its datacenters. The desmonstration covered layer 3 routing funcitons and complex quality of service (QoS) functions.