Calling time on big data in cybersecurity

It seems obvious that the more we know, the more informed we are. But there’s a tipping point, especially in cybersecurity, between the volume of information and what should be flagged for action. As businesses have become more data-driven and automated, IT systems have become an increasingly unwieldy beast to monitor, manage and define. The ‘noise’ generated by everyday user activity has reached deafening levels.

Big data analytics has frequently been identified as the silver bullet to address this - meeting the huge scale of the challenge with an equally comprehensive monitoring system. With big data solutions, security logs and events from every corner of the network are fed into a central platform. This often leads to a mind-boggling number of events being processed - as many as 4-5 billion events per day.

While this level of detail is comprehensive; it doesn’t come without its challenges. With more data, comes more noise, and more false positives. Keeping the necessary activity logs accurate, managed and stored correctly requires a lot of moving parts and continuous upkeep - even if we’re using analytics and machine learning to do the brunt of the work. The cost of the infrastructure required to do this can quickly exceed the value of network you are monitoring in the first place.

Slim pickings

Even if in theory analytics and unsupervised machine learning can ease the load, in practice, there are often important clues about an individual’s intent which could be buried and overlooked by unsupervised machine learning. As a result, the yield of findings of interest can still be very low, with billions of events analysed and accurate alerts delivered at less than one percent. In addition, when you add in the cost and complexity of managing the underlying technology to keep machine learning algorithms churning properly, as well as preparing data in the right way, overlooking intention can become a significant problem.

It’s the actual context behind the data which is often missing. This can be information about users, devices, networks or locations, but when collecting for a machine learning process, this context is often lacking, or not linked. It can be partially resolved through using tools such as record linkage, where individual records in a data set are matched and brought together by common identifiers. However, significant false positives and negatives still remain.

One of the other fundamental challenges is flaws in source material. This can be caused by incorrectly configured source logs, events and telemetry. While once again although there may be millions, or even billions of logs detailing user activity, the linkages to understand context are often absent.

Arguably, the value of any security tool is its yield. How many genuine threats are identified and brought to the attention of IT teams? With big data-enabled security tools the typical large enterprise can manage, at best, a yield of 5 for every 1 billion log lines. So while the value is being generated, it is coming at a very high cost in terms of infrastructure and data center management.

This is a running theme in cybersecurity - global spending was forecasted to reach $103 billion in 2019, according to IDC, but it’s not clear that businesses are feeling any safer. Too many security decisions are made for the wrong reasons with the wrong information. As hackers increasingly target people, businesses need to think about how they secure themselves differently. While data on network and user activity can provide useful insight, ultimately the yield of information that’s actionable is most important.

Shift to the edge

One way of addressing the challenges will be through decentralization. Bringing analytics as atomically close as possible to the data it needs to process – this will help ease the burden of continuously moving large datasets around. In doing so, IT teams can start to reduce the amount of data they have to store and prioritize specifically for security – often redundantly. As more processing takes place at the edge, new analytical approaches and updates will be able to be quickly pushed and spread around the wider network once they’re optimized by a central analytics engine.

In our noisy and dynamic threat landscape, there’s a need for an equally dynamic and automated approach to security. By working in a more distributed way, rather than casting the net as wide as possible, organisations will be able to focus on stopping the next attack, rather than scrambling to recover from the last one.

Calling time on big data in cybersecurity

Slim pickings

Shift to the edge

2023 Data Center Market Trends: Hong Kong Asia's Connectivity Hub

Emerging Energy Storage Technologies

Success story: Kao Data and Cadence

Deliver high quality hyperscale projects