There’s an understanding for IT professionals that for every one terabyte (1TB) of infrastructure, there are actually 8-10 TB of associated overhead. Whether it be for disaster recovery, backups, or high availability, there is always a resource tax placed on the movement of data. And moving data is never “tax” free – in fact, it’s very expensive. Every time a backup is executed, the underlying infrastructure is paying this cost – burning up precious cycles of compute, storage I/O, and network resources. Entire companies have been started to manage just the test and development data copies more efficiently, but this is only the tip of the data iceberg.

Hard drive storage hdd
– Andrey Eremin / Thinkstock

Bridging the I/O gap

Data is growing exponentially. From 2013 to 2020, the amount of data is projected to increase from 4.4 zettabytes to 44 zettabytes, according to IDC. But the issue is more than a capacity problem. The real problem with this data growth is it does not necessarily correlate to better performance. The capacity density of HDDs is getting higher, but it doesn’t matter much in terms of performance, because we still cannot defy the laws of physics. While disk capacity has increased, read and write speeds have grown only marginally over the years. This creates what we call the “I/O gap,” which leads to a performance bottleneck. If we were drinking through a straw from a 16 ounce bottle 15 years ago, we are now using a slightly larger straw, but we are now trying to get the water out of a 55 gallon barrel. Moreover, how many infrastructure resources are being wasted copying from our primary data location to backups, disaster recovery, development, QA, and archiving?

Every time this data is moved, it costs you money. Wondering if you have enough capacity to store the data is one thing, but you also have to account for the overhead to move the data in addition to the capacity it consumes.

Integrated systems like hyperconverged infrastructure are attempting to bring an end to this problem. Hyperconverged vendor SimpliVity, for example, deduplicates, compresses, and optimizes all data inline, at inception, once and forever, across all tiers of the data lifecycle. This optimization of data changes not just the way data is backed up, but it attacks the data problem at the source level.

Data backups generate I/O, which taxes all infrastructure during the complete lifecycle. For example, if you have 100 Windows virtual machines (VMs) and 100 Linux VMs, you are copying those 200 operating system data blocks to your backup appliance in a traditional stack. But SimpliVity backups don’t generate I/O. A SimpliVity backup is just more metadata, which is significantly smaller than the actual data. With hyperconvergence, those 200 operating system data blocks don’t need to be copied over because they are redundant. And not just for backup data, but for primary data as well. Hyperconvergence acts as a preventative measure, stopping the problem of an IT resource tax from ever existing.

SimpliVity’s method of deduplicating, compressing, and optimizing all data inline, at inception, across all tiers of the data’s lifecycle is what creates this efficiency and reduces the resource tax. Because all data is compressed, optimized, and deduplicated, and VMDKs are represented by metadata, creating a point-in-time backup becomes a matter of creating a copy of the metadata. By using the metadata, no read or write IOPS have to occur for data and a complete and independent backup can take place in seconds.

In a world of data efficiency, the ripple effect of one TB being associated with 8-10 TB of overhead is no longer true. The tax to move the data, the tax to read and send the data, the overhead of CPU and IOPS to write the data on the other side, all of these non-linear pain points are relieved when data is made efficient at the onset.

Jesse St. Laurent is vice president of product strategy at SimpliVity.

Learn more about hyperconvergence in our Open Research section or watch our free webinar.