Both Facebook and LinkedIn announced mega-data leaks within just a few days of each other, potentially compromising personal information (PI) and personally indentifiable information (PII) of some one billion people. That’s a lot of people, and these leaks justifiably dominated the news cycles. Yet breaches and leaks like these are a symptom of a larger disease afflicting enterprises and businesses large and small: data hoarding. Because the question is sometimes not how are companies protecting personal data - but rather why are they even retaining it in the first place?

Tens of thousands of unknown breaches

Even as we regard massive-scale breaches of mega-companies with passing outrage, then equanimity, tens of thousands of small breaches worldwide are exposing sensitive data – creating unnecessary liabilities, bad public relations and regulatory friction for tens of thousands of businesses.

These micro-breaches - as much, if not more than the Facebook and LinkedIn leaks - speak to the fact that “the more, the merrier” is no longer the case when it comes to holding onto data, particularly when it is sensitive. 

Data can provide organizations incredible value, but carries risks of theft, compromise and misuse.  Today’s companies need to be highly selective about the data they store, how long they keep it, and how they protect that data. Here’s why data hoarding is a bad idea, and four best practices to help avoid it.

Regulatory compliance imperatives

Some three years after GDPR brought data privacy into the spotlight in the EU, newer regulations like the California Privacy Rights Act of 2020, Virginia’s recently-passed Consumer Data Protection Act, and New York’s impending New York Privacy Act are also changing attitudes and actions in the US.

Under many of these laws, companies collecting personal information need to disclose the specific purpose of the data collection, gather only data needed for those purposes, and store it securely. Moreover, all emerging data privacy regulations require companies to facilitate deletion of PII upon request. And all this assumes that companies know what they have. But the truth is that after years of data hoarding, many companies truthfully don’t know what data they have and where it’s stored – making data protection and compliance a near impossibility.

Four best practices to end data hoarding 

To put a stop to data hoarding and roll back the clock on previously hoarded data, here are four best practices:

1 Discover ALL of it

This may sound basic but from personal data through dark data, and including regulated data of any type, in any language, in the data center or the cloud - you can’t protect what you don’t know you have.

2 Classify it

After finding comes understanding. It is crucial to classify sensitive data – and to do so at petabyte-scale and beyond traditional pattern matching and regular expressions. 

3 Define policies

What to keep and what to throw out? Define and apply policies for data retention, then automate workflows to act on data aging, tagging which data to keep and how long to keep it, and marking over-retained data for deletion.

4 Remediate

For sensitive, critical, regulated and high-risk data, it is crucial to manage remediation workflows. Make sure you are delegating decisions to the right people, and review findings and violations across all your data sources for structured and unstructured data. 

Putting an end to extraneous data hoarding paves the way to compliance and lower liability.