Let’s be blunt about the claims traditional storage architectures make in regard to guaranteed performance. In reality, they simply do not provide performance-level guarantees that cloud service providers’ customers have come to expect. Yet, myths of guaranteeing performance persist in the storage market, despite the fact that new approaches to quality of service (QoS) are necessary.
The issue is not about raw performance for applications - rather it’s the need to ensure that performance is both predictable and guaranteed, regardless of what other demands are being made on the storage array. Achieving this in hybrid or disk-based storage infrastructures is next to impossible.
Reactive responses to performance issues
Most of the popular storage QoS methods are based on tiering, rate limiting, prioritisation, hypervisor-based QoS and caching. These methods are inadequate to meet service providers’ needs because they are not part of a comprehensive ground-up QoS solution. Rather they are a reactive response.
- Tiering, for instance, simply cannot be used to create a viable storage QoS. It gives preferential treatment to demanding systems by classifying their data as hot and moving it to the solid-state drives (SSDs). Other applications have to make do with the lower-performance hard-disk drives. But performance varies considerably for individual workloads as the algorithms move data between media, causing the applications to inefficiently hop around. Relying on media with no concept of capacity allocation and load distribution does not scale and will consistently cause application performance problems.
- Rate limiting, on the other hand, sets fixed limits on the I/O or bandwidth that each application can consume. This reduces the problem of noisy neighbours but only by limiting the maximum performance for each application. It can also incur significant latency, further negatively impacting application performance. It does not deliver guaranteed QoS and it’s not possible to set IOPS minimums. The ability to guarantee an IOP minimum setting is fundamental to writing any meaningful performance SLA with a customer.
- Prioritisation, as the name suggests, assigns certain applications higher priority/importance relative to other apps in the system. It can deliver a higher relative performance for some applications but there are no guarantees. What’s worse, noisy neighbours can get even louder if they are prioritised ahead of other applications.
- Hypervisor-based QoS takes the latency and response times of individual virtual machines (VM) and uses them as a basis for setting thresholds beyond which the system limits the I/O rate for the respective machine. However, it has very limited governance of the underlying storage resources which results in the lack of IOPS control, potential performance degradation, the risk of forced over-provisioning and lack of coordination and orchestration.
- Caching stores the hottest data in large DRAM or flash-based caches, which can offload a significant amount of I/O from the disks, but impact on overall throughput of the spinning disk system causes highly variable latency. For instance, overall performance for an individual application is strongly influenced by how cache-friendly it is, how large the cache is, and how many other applications are sharing it. In a dynamic cloud environment, the last of these criteria is changing constantly.
All of these methods are used in a bid to overcome the limitations of traditional storage systems but actually have limitations themselves. As such, the only way that service providers and enterprises can truly ensure storage performance is to adopt a new approach in which QoS is an integral part of the system design, rather than a reactive response.
QoS as architecture
In short, QoS should not be a feature, it is a fundamental architectural decision. This approach ensures guaranteed performance in all situations including failure scenarios, system overload, variable workloads and elastic demand. A true QoS architecture should have these six components: an all-SSD architecture, scale-out capability, RAID-less data protection, balanced load distribution, fine-grain QoS control and performance virtualisation.
QoS as architecture may seem like a radical step given that it essentially subverts the traditional approaches outlined above. That said, it’s far more effective, comprehensive and successful, as well as being precisely what is required to address the pressing need faced by service providers and their customers.
In short, QoS should not be a feature, it is a fundamental architectural decision.
A true QoS architecture delivers performance control and guarantees without compromise. It offers consistent I/O latency, predictable performance gains as systems scale and in any failure situation, the elimination of hot spots that create unpredictable I/O latency, and guaranteed on-demand volume control of performance and capacity independently. This modern storage architecture approach - purpose-built with integrated QoS - overcomes any problems and enables quick provisioning, whilst delivering much more simplified management.
It’s hard to argue against it when the benefits are so compelling. It makes the current reactive attempts to ensuring certain performance levels look like a ‘patch and repair’ method, which indeed they are. All this does is perpetuate the myth that performance issues can be successful addressed by traditional storage architectures when in reality, they can’t.
To be truly successful, a modern approach is required in which QoS is an fundamental architectural design element, engineered in not added on. This will turn it into an unstoppable force that drives the transition to a next-generation data centre, creating a more agile and scalable infrastructure, with increased application performance and predictability.
It’s as simple as that.
Martin Cooper is director of systems engineering at SolidFire- now part of NetApp