There’s a multi-party arms race going on in the world of data. In one corner we have the data creators, the social media platforms et al, that are generating zettabytes of data. Lest we forget, a zettabyte is a one with 21 zeros after it. In the other corner, we have the data storage companies, desperately trying to manufacture more efficient ways to hold all of this information. Then there’s the computer hardware companies creating more and more powerful machines to use the data.
The latest milestone was reached earlier this month with President Obama announcing plans to create a supercomputer capable of one quintillion calculations per second. A quintillion is a much more manageable one followed by 18 zeroes. Finally, there are the data scientists who are racing to craft new techniques that can make sense of the data and harness its power.
Clearly the numbers involved in data are fast reaching the limits of the human imagination – we’re apparently generating 2.5 quintillion bytes of data a day. If each byte was a one cent coin they would cover the surface of the Earth five times. With the numbers in essence becoming abstract concepts, it makes much more sense to think about the practical applications of this volume of data. Specifically, what can data scientists do with more powerful machines and limitless data?
Raising the bar
The first noticeable advantage in having increased computing power is a reduction in the time it will take to carry out data science projects. Reducing the time it takes to receive results will allow for more decisions to be made in real-time. This will have a significant impact on industries such as retail, where a shop could automatically alter its pricing strategy instantaneously based on weather data, customer demographics and footfall.
Next, the processes involved in data science will become ultra-efficient. There will be decreased processing time and less time spent accumulating and preparing data. This in turn will open up data science to work with data sets that were previously inaccessible. For instance, helping with the mapping of the human brain and combining that information with data on a participant’s emotions and lifestyle to obtain a picture of how the brain is affected by external factors.
Machine learning will become much more powerful, opening the doors to the creation of artificial intelligence. It’s worth pausing to consider that we could unlock what is going on in a human brain and theoretically replicate it all by applying mathematical models in ones and zeroes. As a side note, an upward estimate of the memory capacity of the human brain is 2.5 petabytes. A paltry one followed by fifteen zeroes. Of course, measuring our consciousness with the same metric we use to measure the size of a funny cat video on YouTube is somewhat foolish.
There will be additional benefits to product design, especially in the field of aeronautics. Proposed designs could be simulated without the need for wind tunnels and other expensive tools. Potentially one of the most exciting advances will be the development of personalized medicine. Data science will be able to look at an individual’s genome, their lifestyle and alter drug properties accordingly to make them more effective.
The analysis of big data sets has already had revolutionary impacts on the commercial sector and within scientific discovery - from assisting in relief efforts following natural disasters to tailoring the consumer journey on eBay. In the future we can expect to see more advanced weather forecasts, natural disaster prediction services and more accurate cancer diagnostics. With data science also responsible for unlocking key Islamic State military strategies, it’s going to play a bigger role within national security.
Data science is going to undergo a rapid transformation into a faster, more accurate and more efficient process. The range of tasks that will be undertaken by machines will increase, spurred along by advances in machine learning and faster computer speeds. What we may be able to calculate in a week, in the future will take minutes. The scope of data we will be able to deal with will also increase and a greater variety of data will lead to more insights that can be found from seemingly disparate data sets.
Next time you’re grappling with the complexities of numbers involved in data production and storage, it’s worth remembering where this road is taking us – ultra-personalized medical care, artificial intelligence and smart cities, just to name a few. It is these thoughts that make the announcement of a new supercomputer profoundly inspiring.
Mike Weston is the CEO of data science consultancy Profusion.