Big Data, as a strategy to generate value from large, diverse and complex data sets, is changing how organizations understand customers, deploy products and operate in their specific industries.
It has compelled all types of organizations to start using different technologies such as the Apache Hadoop framework, Not-only SQL (NoSQL) data stores and other big data management, processing and analysis solutions.
What is big data?
As of 2014, 2.5bn GB of data are created every day. During the past decade, advances in technology have helped to create continuous streams of data. This data is generated in a number of ways, including: internet browsing, smartphone activity and movement, digital business processes, social media activity, and sensors in buildings, products and people.
The explosion of data in terms of volume, velocity and variety is known as big data.
- Volume refers to the amount of data generated. A decade ago, data storage for analytics was counted in terabytes (TB). Now, organizations require at least petabytes in storage.
- Velocity refers both to the throughput of data and its latency. The first one represents the amount of data in movement (measured in terms of GB or TB /s). The second one relates to the delay between the data ingestion and the data analysis (measured in milliseconds).
Variety refers to the amount of data and the heterogeneity of data (structured, semi-structured or unstructured).
The need to capture, process, store and analyze data has generated a new breed of technologies including NoSQL data stores, Massive Parallel Processing (MPP), in-memory and distributed systems.
The dual nature of big data
Since the data center is the core of the digital strategy of a company it is natural that it is going to be profoundly affected by big data. Although initially one may think that the impact of big data in the data center is limited only to storage requirements, DCD Intelligence considers that the impact is much greater and it affects multiple facets of the data center.
Let’s think for a moment in the context of Formula One racing (F1). Teams such as Lotus F1 build their own private cloud running 50 virtual servers to use for every race. In each lap, the team collects and analyses as much as 30MB from 250 sensors in every car.
As a result, in the last Monza Grand prix more than 3Tb was analyzed in real-time. The challenge is not only having the appropriate architecture but to be able to set up the right bandwidth capability to work with streams of data. That gives a hint of the complex relationship between big data and the data center. DCD Intelligence has identified two main areas of impact: big data as a source of complexity in infrastructure and maintenance needs of data center; and big data as a driver for the optimization of data center.
Big data as a source of complexity
Big data initiatives impact the data center infrastructure in three areas:
- increasing data storage requirements;
- increased data transmission needs within and outside the data center;
- increasing the demand for high-density and or schema-free (or non-relational) computing environments.
These three factors are driving the need to deploy more racks, cabling and servers and to iomprove physical structure to cope with big data’s needs.
More servers that process, store and, mainly, analyze large quantities of data in secure, air-conditioned rooms will increase electricity requirements to accommodate operations and cooling. A similar situation happens with racks and cabling.
With a global demand for big data, the data center infrastructure is expected to become more important than ever, although the initial focus is on big data software. However, the scarcity of resources is challenging the data center architecture. DCD Intelligence expects significant developments in data-driven organizations that are searching for efficient ways to balance use of energy and resources with using big data.
Nowadays, organizations struggle with a common problem of infrastructure and operational management. Infrastructure components (such as storage, server, racks, network devices, power and cooling systems, virtualization, cloud computing, etc) that support business-critical applications generate thousands of alerts per day that report on the health, performance and availability of these components. The process of analyzing all these alerts and incidents to improve operational efficiency is expensive and laborious. It cannot be effectively done manually by IT personnel.
DCD Intelligence is observing a progressive instrumentaiton of the data center. This process involves a gradual deployment of sensors along the entire data center infrastructure. The goal is to be able to gather critical information for data center conditions such as temperature, humidity, and airflow. This information is attached to an increased capacity to monitor and analyze the existing technology in the data center (hardware such as racks or servers and software such as DCIM).
This instrumentation can be understood as a scenario in which there are multiple sources of information (sensors and information management systems), in which the data should be analyzed continuously (in real time or streaming) and in which a large amount of data can be generated.
Big data is emerging as the solution to analyze the lifecycle of a data center, and can bring data center optimization, operations and design to the next level.
This value is not just derived from big data technologies; it is the result of applying analytical models to improve performance.
The applications include IT operations analytics, infrastructure monitoring, virtualization monitoring, environmental monitoring (or green IT), operating systems analytics, etc.
The benefits of deploying big data analytics are multiple, including:
- Gaining operational visibility across data center infrastructure.
- Monitoring infrastructure in real time and correlating events across layers.
- Combining streaming data with historical data to detect patterns and prevent poor performance issues.
Big data is called upon to become a change agent in the way organizations manage and optimize the data center. But are the companies really aware about the impact of big data on the datacenter? And more important, are they ready to manage this impact?
Big data may be singing its heart out but will it win a place at the final?
Josep Curto Díaz is an associtate analyst with DCD Intelligence