As solid state drives (SSDs) are deployed in data centers in both hybrid HDD/SSD and all flash arrays (AFAs), it is becoming increasingly important to understand what metrics are relevant to assess SSD data center performance. While the traditional metrics of IOPS, bandwidth, and response times are commonly used, it is becoming more important to report and understand the ‘Response Time Quality of Service’ of those metrics.

Response Time Confidence Levels and an understanding of Demand Variation and Demand Intensity can help the IT manager assess how a given SSD or array will perform relative to the requirements of an application workload or relative to a specific Response Time Ceiling thus helping in the overall system optimization, design, and deployment.

SSD versus HDD
SSD versus HDD – Thinkstock / AdrianHancu

What are workloads?

Workloads are data streams generated by applications that are seen by the storage as a collection of access patterns. An individual access pattern is characterized by the spatial and temporal locality of the IO stream, random or sequential access, data transfer size, and read/ write mix. The workload is further described by the binary data content (or data pattern) of the transfer and its demand intensity (or number of threads and queues).

The data pattern can be the result of a completely random transfer (i.e. random data pattern of 1s and 0s) or a workload that has some level of data reducibility (i.e. compressible or dedupable). The demand intensity is a result of the number of workers (or virtual machines) and jobs (requests) generated by the system and applications.

Workloads can be corner case benchmarks, synthetic application or real world IO captures. Corner case benchmarks are used in the lab to confirm performance outside the range of normal usage. Synthetic application workloads use one to several commonly accepted access pattern definitions of popular applications (such as RND 8k 70:30 RW mix for a SQL OLTP workload). Real World IO captures are multiple IO stream workloads that are identified while running actual applications on deployed servers or systems.

Real World workloads are comprised of IO Streams that are generated in application (User) space and are modified as they pass through each layer in the software stack. Each layer of abstraction changes the IO streams by appending, coalescing or fragmenting the IO streams. Real World workloads seen at the storage level are quite different from application space or from workloads used in the lab for SSD optimization and design.

What are data center workloads?

Data center workloads, as seen by the storage, are the collection of data streams comprised of metadata and data content as managed through the various layers of the IO software stack. Caching layers, data reduction, data deduplication, storage control layers, storage pools, and backup architecture can all affect the data streams that the datacentre storage ultimately sees.

How are data center workloads tested?

Regardless of the original data stream content and how it may be modified by the various software layers, the workload that is ultimately presented to the storage is what is important for data center storage performance testing. Of course, characterizing the application space workload and how it is ultimately presented to the storage is key. Various tools and methodologies (such as IO Traces and IO Captures tools) are available to capture and replicate these workloads. Free cross platform IO Capture tools with workload visualization can be downloaded at www.TestMyWorkload.com.

Whether the workload is a synthetic approximation of the application workload or a trace capture and playback, the test operator ultimately has to apply the selected test workload to the storage and measure and analyze its performance.

Measuring workloads for performance analysis

Once the test workload has been determined, it is important to test the storage in a deterministic fashion to ensure that the actual storage is tested and that the key metrics are relevant to and for the test purposes. Industry standard test methodologies have been developed to ensure fair and accurate testing of SSDs both at the device and system level.

Among the key points to remember are:

  • Use a reference test platform with known hardware and software
  • Precondition storage to a workload dependent steady state
  • Set test parameter variables to match the intended application environment
  • Use a robust stimulus generator and measurement tool with known attributes
  • Report test results with disclosure of the test settings in a standardised format

What is a response time histogram?

A response time histogram is a plot of the frequency and distribution of response times for every IO that occurs during the measurement period. During the histogram, every IO completion time is measured with every IO count cumulated in a corresponding time bin, not just the average and maximum response times.

Why confidence levels and why not use only ART and/or MRTs?

Average and Maximum Response Time (ART and MRT) are useful metrics but do not provide the quality levels that confidence levels provide. ARTs can misrepresent the high range of an IO set while MRTs alone can misrepresent the range of all other IOs. ARTs do not provide enough detail of the range of IOs measured because it is only an average and therefore masks highs and lows, while one high outlier reading in a MRT could mask a low ART and give a deceptively high result.

What is response time quality of service?

Response Time Quality of Service (QoS) is a measure of the full span of response times by IO completion percentages. In other words, QoS shows at what time value a given percent of the IOs will complete. Thus, if a Response Time QoS level of ‘five nines’ is 20mS, then 99.999% of the IOs will complete in 20mS or less.

Tracking the 5 9’s Response Time Confidence level shows at what completion time value 99.999% of the IOs will return - or when 99,999 out of 100,000 IOs will return.

What is a response time ceiling and why is it used?

A Response Time ceiling is a time value threshold above which no IO response times will be accepted by the application. In other words, no IO response time can be greater than the stated RT ceiling. The RT ceiling is usually viewed with regard to a level of confidence such as the 5 9’s confidence level Examples of applications that will not accept any IOs after a certain amount of time, or IOs that exceed a given RT ceiling, include database applications wherein an individual request may be comprised of several, say ten, IOs.

All ten IOs must be returned within the stated RT ceiling in order for the request to be fulfilled. If one IO is late, then the total request is a failed response (unless other optimizations are in place). Note that storage that exceeds the RT ceiling can be ‘tuned’ to lower the 5 9’s Response Time level or the system can be otherwise optimized to account for the given 5 9s response time level (by increasing cache levels and by other means).

Conclusion

Understanding data center workloads and Response Time Quality of Service (RT QoS) is key to selecting the right SSD for your servers. SSD performance depends on the nature and composition of your Workload IO Streams and your RT QoS will vary depending on the workloads and SSDs you select.

Use of Response Time Quality of Service, Demand Intensity and Demand Variation can provide the IT manager with the tools to understand the native storage performance range for specific synthetic application and real world workloads.

Testing to the appropriate synthetic or real world workloads and using the relevant test metrics can further help you understand how much performance you need to buy and help you validate your software stack optimization strategies.

Eden Kim is Char of the Storage Networking Industry Association (SNIA) Solid State Storage Technical Work Group, representing Calypso Systems.