Country missing? Please select your nearest region...
Published on 25th July 2013 by Yevgeniy Sverdlik
While I get the desire of the mainstream press to continue reporting on the NSA surveillance program from as many angles as they can possibly find, some reporters have chosen one angle I cannot possibly understand the fascination behind: trying to assess data-storage capacity of the spy agency's data centers.
First of all, unless the NSA discloses the specific types and amounts of storage devices it has installed or plans to install and its approach to managing its storage capacity, any attempt to assess the storage capacity will be inaccurate and misleading.
Second of all, this is one of those rare instances when size really doesn't matter. The NSA's data-storage capacity is completely irrelevant as part of the story about the government's potentially overreaching eavesdropping on electronic communications.
If anything, it distracts from the real issues at hand. These are issues that include, for example, the fact that the majority of Internet users still do not understand how insecure and public their online activity really is. Another real issue is that US government is not the only government with the resources, the staff and the natural inclination to exploit cybersecurity holes in pretty much any system they find necessary to exploit.
The latest example of such a futile and distracting piece of journalism came today from Forbes, whose reporter got a hold of floor plans for the NSA's new data center in Bluffdale, Utah, and went to town trying to convert square-footage figures into bytes-of-data figures.
The article attempts to assess the amount of storage capacity and, for some reason, assumes that all racks on the raised floor will be filled with some kind of homogeneous storage devices. That has never been the case in any data center I have ever heard of.
A data center floor is usually filled with a variety of devices that do a variety of things. Some servers are optimized to compute but have little on-board storage capacity, some are storage arrays, packed to the brim with disk. There is also a variety of storage media (Flash or disk) in a typical data center, all of varying density.
Some data center managers use advanced storage-management technologies like deduplication and compression, which significantly increases the amount of information a unit of storage capacity can hold. Advanced data centers also use storage tiering, which means multiple types of storage devices configured differently to store different types of data.
Another important point the Forbes article misses is that the capacity for storing data is not nearly as important as being able to process data and derive valuable information from it. Making sense out of data is a lot harder than storing it, so the NSA's compute capacity, in terms of processor cores, and the analytics methods its data-miners use are much more interesting questions.
Non-profit investigative-journalism outfit ProPublica, for example, has asked the right questions on this and received some rather interesting answers. One of them, ironically, is that the NSA cannot search through its own employees' email communications.
Perhaps the agency's officer talking to ProPublica's Justin Elliott was simply reacting to the “high-tech-evil-spy-agency” scandal by trying to spin the story the other way (as in, “we're not really as advanced and scary as everybody thinks we are”). Perhaps, she was just trying to get out of responding to a nice and narrow FOIA request from a reporter.
We don't know the answers to these questions, but at least they are questions worth asking.