Open source analytics merchant MapR Technologies is helping data detectives at Terbium Labs search for stolen data on the Dark Web.

Terbium invented a data fingerprinting technique using the MapR version of Apache’s software framework for distributed storage and processing of very large data sets. Terbium’s system, Matchlight, is a big data intelligence system that it claims closes data breaches and minimizes the damage caused.

According to the company, the average data breach takes more than 200 days to discover, giving adversaries months or years to exploit a security incident. With Matchlight, identification of stolen data takes minutes.

Digital fingerprint
– Thinkstock / Maksim Kabakou

Deep dive

Terbium’s scalable, cloud-based system continuously trawls the Internet, registering the digital fingerprints of data, which range from valuable source code to corporate documents. It searches for stolen material by comparing it to data gathered across the Internet. This one-way fingerprinting respects privacy and means that clients need not reveal their information in order for Terbium Labs to look on their behalf.

With 350 billion data fingerprints in its database already, and ten to fifteen billion more added each day, the storage needs to be more stable and efficient than the more Java-heavy Hadoop distributions, said Danny Rogers, Terbium Labs’ CEO.

“We want to shut down the market for stolen data by reducing the time to detect a breach and thereby minimize the damage,” said Rogers. “MapR shines as the only Hadoop distribution that can reliably exceed our demanding volume, scale and speed requirements.”

The system recently identified 30,000 newly stolen credit cards and 6,000 newly compromised email addresses for sale on the Dark Web - all in a single day.