Microsoft has proposed a new interface for solid state storage, which should allow easier technology upgrades and faster integration in cloud data centers.
Solid state drives (SSDs) are normally packaged as monolithic devices which can be dropped in as direct replacements for traditional rotating hard drives. An alternative interface called Open-Channel exposes the internals of the SSD to the computer operating system, but using Open Channel means adopting proprietary interfaces specific to the device in question.
In Project Denali, Microsoft is proposing that the interface could be split up, so that there is one standardized part dealing with plain storage, and another dealing with issues like bad blocks which are specific to the particular product. The scheme would allow data centers to optimize their storage for the specific applications they run and the hardware they use, and integrate new SSDs rapidly.
The Project Denali prototype was developed with storage silicon specialist CNEX Labs, and will be offered through the Open Compute Project (OCP) as a standard when it is complete. It was demonstrated at the OCP Summit in San Jose this week by Azure hardware infrastructure manager Kushagra Vaid and senior software engineer Laura Caulfield.
”Storage paradigms have performed well on-premises, but they haven’t resulted in innovation for increasing performance and cost efficiencies needed for cloud-based models,” Vaid said in a blog post.
“Fundamentally, Project Denali standardizes the SSD firmware interfaces by disaggregating the functionality for software defined data layout and media management. With Project Denali, customers can achieve greater levels of performance, while leveraging the cost-reduction economics that come at cloud scale.”
Project Denali defines the roles played by the SSD and the host in a standard interface, so issues specific to the device such as media management remain on the device, while the host gets on with the business of sending and receiving data, maintaining an address map and performing garbage collection.
This means that SSD suppliers can build simpler products for data centers, and deliver them more quickly, while also minimizing the disruption further up the stack when they are introduced to the facility. The project is especially useful for installations where the hosts have FPGAs or microcontrollers.
Mismatch between servers and storage
Caulfield gave more details in another blog post: “The specification defines a new abstraction, which separates the roles of managing NAND, and managing data placement. The former will remain in the hardware – close to the NAND and in the product that reinvents itself with every new generation of NAND. The latter, once separated from the NAND management algorithms, will be allowed to follow its own schedule for innovation, and won’t be prone to bugs introduced by product cycles that track solely with NAND generations.”
According to Caulfield, the specification aims to allow workload-specific optimization, rapid introduction of new NAND generations and a broader set of applications on massively shared devices, with a choice of vendors.
One key benefit is ironing out the mismatch between the expectations of servers and storage systems, she said. Storage systems are still optimized for single workloads, while cloud implementations combine multiple virtualized workloads: “As the number of cores in a server increases, a single machine can support more VMs. When storage servers increase their capacity, they typically increase the number of tenants using each as a back-end. While there are some notable exceptions, there is still a need for cloud hardware to provide enough flexibility to efficiently serve these multi-tenant designs.” With caching, this divide gets greater, as the SSD controller collects writes to fill one flash page at a time, mixing data from multiple applications.
The first prototype, created with CNEX, takes things slowly, Caulfield said: ”While the interface change opens up opportunities to optimize across many layers of the storage stack, we modified only two components: the firmware and the lowest level device driver in Azure’s OS. This allowed for a quick evaluation of the ideas, provides infrastructure for legacy applications and sets up the system for future optimizations.”
The results so far have been better than expected, she said. A lot of overheads have been moved from the drive to the host, leaving a slightly better throughput and latency than standard SSDs.
Other partners in the project, besides CNEX Labs include Intel, LiteOn ,Marvell, Broadcom, SK Hynix and Samsung.
“We look forward to finalizing the Denali specification in the months ahead through the Denali group and plan to make this specification available broadly later this year,” Caulfield said. “Refactoring the flash translation layer will promote innovation across the storage stack.”