Dark data: Managing the data you can’t see

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

In today’s era of seemingly infinite data volume and complexity, many enterprises are unintentionally neglecting an entire category of data that is critical to their data protection and management practices. On average, more than 50% of a company’s data is “dark” – information held up in data repositories with no attached or determined value. In addition to costing an average $26 million in storage expenses per year, dark data poses significant risks to an enterprise’s security and compliance efforts, making it more important than ever to address the foundational issues that cause it. 

Dark data threatens protection

Most businesses lack clarity around the data they need to protect. Because dark data is often out of sight and out of mind for many enterprises, dark data reservoirs – holding sensitive and valuable data – become an enticing target for cybercriminals and ransomware attacks. 

Additionally, nearly half of senior IT decision makers cannot confidently and accurately state the exact number of cloud services that their company is currently using, even as enterprises implement a multicloud approach with both on-premises and public cloud resources as part of their data infrastructure. If an organization fails to shine a light on dark data, especially dark data stored in the cloud, multicloud approaches can further widen the door to cyberattacks and recovery at scale cannot be ensured.

Surviving any kind of ransomware attack requires an understanding of what and where your data is, as well as what it’s worth. The more organizations know about the data they hold, the more effective they will be in understanding how to protect it from risk and how to recover after an attack. 

Dark data threatens compliance 

Untagged and unstructured data also poses challenges to meeting regulatory landscapes that are constantly evolving. For example, the California Consumer Privacy Act – or CCPA – which is currently limited in scope but will become fully operative by January 2023, will require businesses – including data brokers – to give consumers notices explaining their privacy practices.

While we don’t yet have a federal data compliance law, states are following California’s lead. With data privacy laws expanding into Virginia, Colorado, Massachusetts and New York, companies that identify and catalog their most critical information, remove information that contains no value and ensure compliance with all local regulations are best suited to proactively manage information risk and eliminate gaps in data governance.

Tactically, enterprises may implement data capture, archiving and surveillance capabilities to follow data compliance requirements. Better management of dark data will help companies comply with stringent regulations and implement retention policies across their entire data estate. 

Dark data and sustainability

What’s more, dark data plays a significant role in an enterprise’s environmental compliance – another set of increasing regulations. As enterprises work to develop sustainability programs to meet carbon reduction standards, the environmental cost of dark data must be a priority. Dark data storage was estimated to emit 6.4 million tons of carbon dioxide into the atmosphere in 2020. And the future outlook is even worse – analysts predict an increase of 91 ZB of dark data by 2025 (over four times the volume in 2020). This means dark data will continue to emit carbon into the atmosphere at alarming rates.

To protect the planet from dark data’s waste, businesses must review their data management strategies, identify valuable data and rid their data centers and clouds of unnecessary data. By properly managing dark data, there is significant opportunity for enterprises to reduce their carbon footprint, comply with industry environmental regulations and meet sustainability goals that are increasingly important to a wide range of stakeholders.

Managing and protecting dark data

It’s clear that dark data poses threats to an enterprise’s security and compliance. So how can data managers better identify, manage and protect dark data within their company?

First, data officers must develop and act from a proactive data management frame of mind, which allows organizations to gain visibility into their data, take control of data-associated risks and make informed decisions on which data to keep versus delete before a critical security event takes place. 

Some tactics data managers should implement to establish a proactive mindset are data mapping, used to discover all sources and locations of collected and stored data, and data minimization, used to reduce the amount of data being stored and confirm that retained data is directly related to the purpose in which it was collected.

Second, enterprises should also use technology advancements to their advantage. Artificial intelligence (AI) and machine learning (ML) offer significant opportunities to effectively identify, manage and protect large pools of untagged, unstructured data and play a vital role in data management processes. 

The ultimate goal is to manage the information, not just the data, at the source (edge) by quickly scanning, tagging and classifying information to ensure that sensitive or risky data is properly managed and protected, regardless of where it lives. As such, transparent AI and ML policies help businesses gain full visibility into their data by sourcing vulnerabilities and securing risks. That’s the next frontier.

Properly managed dark data offers a more secure and compliant future for organizations, lowers costs and enables actions via previously untapped intelligence, opening possibilities for organizational optimization and innovation within any company. 

Ajay Bhatia is vice president and general manager, data compliance and governance at Veritas Technologies.

Originally appeared on: TheSpuzz