When we in IT talk about the data problem, we’re often talking about the vast amounts of processed and stored data moving among devices, systems, and data centers. But the data problem today isn’t exactly the problem of too much data, though that’s certainly connected. The data problem is really a problem of I/O capacity, which isn’t getting any faster, even as hard drive capacity grows.

Slower IOPS means that application performance suffers, and data-related tasks like backups and replications can’t get faster, no matter how much storage is available or how densely packed the drive is. It’s in the realm of performance, not overloaded storage cabinets, where IT really runs into problems.

Hard drive
– Thinkstock / photodisc

The pain points

Deduplication in a traditional IT stack doesn’t work to solve this problem like it should because the same block of data gets deduplicated multiple times by various appliances in the stack. A backup appliance will deduplicate the data from backups. The same block of data will get deduped by the WAN optimization appliance before it goes across the wire. Other data protection apps may back up the data separately and deduplicate it as well. Today’s IT needs a way to simplify deduplication, among other things, to solve the data problem.

We’re now entering the territory of the zettabyte, the unit of measurement beyond petabyte and exabyte. The amount of data worldwide is expected to hit 44 zettabytes (the equivalent of 40 trillion gigabytes) by 2020, according to IDC Research. That’s an enormous amount of data, all of it in data centers around the globe, and much of it causing slow performance. Organizations are indeed scrambling to add more storage and optimize data the best they can, but they’re also fighting on the application performance front.

The hard disk drives of legacy IT keep getting bigger, with drives holding more than a terabyte now, but their IOPS delivery rate hasn’t kept pace at all. Analysts predict that trend will continue, so hard drive technology itself isn’t going to solve the problem anytime soon. Flash, or solid-state drive, has re-emerged as a viable way to add speed. But solid-state comes with its own issues, like write performance that can spike and higher cost, so it’s not an IOPS panacea.

Other attempts at data efficiency have been plugged in to the data center in various ways since virtualization became widespread. It’s come in the form of dedupe appliances and applications, with various dedupe methods, and through compression appliances. There are also plenty of backup tools trying to fill the gaps, with enterprises buying backup appliances, WAN optimization tools, and cloud gateway appliances. Those products are all just addressing symptoms, not solving the essential problem of storage holding back growth.

What’s broken in the data lifecycle

Data growth mostly represents an exciting opportunity for businesses, and IT can take advantage of it in lots of new and interesting ways—big data analytics projects, for example, or Internet of Things initiatives. Businesses can see tons of insights on customer behavior, product success, and much more, as data points pour in and more unstructured data is stored and available.

But IT is often just trying to keep up with these advancements, using old technology with big limitations. Along the way, they are running into many related issues, like that RPOs and RTOs are much faster than they used to be at the same time that backup windows are getting unmanageably large. Transmitting large amounts of data from the point of it being written through backing it up offsite, isn’t feasible anymore.

What IT teams are finding now is that they’ll need a new approach to storage itself to confront the data problem. And the solution isn’t just more storage. Instead, it’s freeing storage to do what it does best—store vast amounts of data until it’s needed. In most virtualized environments, a lot of storage capacity is trapped, dedicated to virtual workloads for just-in-case performance needs. A better way is for storage to exist as a single, dedicated pool for all workloads to access.

Hyperconverged infrastructure creates this shared resource pool, which eliminates a ton of inefficiency. The hyperconverged approach also attacks the data problem head on, with inline data deduplication, compression, and optimization, all of which cut down the IOPS that will be traveling through the system in the first place. The importance of inline deduplication, compression, and optimization, all at inception, before data is ever written to disk, cannot be overstated. This allows the data center to become ultra-efficient.

For example, customers using SimpliVity, a hyperconverged infrastructure vendor, see an average data efficiency ratio of 40:1. This means that 40TB of logical storage translates into just 1TB of physical storage. That’s 39TB of I/O and disk capacity that was never consumed. Solving the data problem primarily means cutting down on unnecessary IOPS, and keeping storage out of the way of the data’s path.