Inside the haunted house of dark data: The real cost of poor data quality

29.5.2025
Dr. Ramona Greiner, Dr. Matthias Böck

Introduction
Why is dark data problematic?
1. Foyer of missing use cases
2. The storage cellar of archiving
3. Room of future application
4. Storage of storages
5. Inaccessible data attic
6. Niche of non-usability
7. Chamber of costs and technological efforts
Conclusion and outlook
What can companies do to avoid or tame dark data?
How FELD M helps to avoid dark data

Data quality is like good property management: invisible when it’s working, but critical when it’s not. Without clean, accessible, and up-to-date data, dashboards, AI applications, and predictive analytics simply won’t deliver. The behind-the-scenes work - tracking, maintenance, documentation - is too often seen as boring (we disagree!), but it’s absolutely essential.

What seems like “nice-to-have” work is often the reason data goes unused—often with far-reaching consequences.

In this article, you'll learn

Why so much data remains unused - and what risks this entails

How missing use cases, outdated systems, and redundant storage create "dark data"

What tracking and clean documentation have to do with data quality

The true cost of bad data - technical, legal, and environmental

Which specific measures help to make data usable, findable, and valuable

And why even boring data maintenance is the foundation for any AI

Much of the data companies collect today goes unused - hidden away in archives and silos. This hard-to-find, disorganized, or simply forgotten data is known as dark data.

According to Gartner, dark data refers to information collected, processed, and stored in the course of routine business activities, but never used - whether for analysis, automation, or business decisions.

Studies estimate that more than half of all company data is "dark" data. Ignoring the importance of data quality or skipping established best practices for data maintenance can lead to missed business opportunities, unnecessary costs, and legal problems.

Why is dark data problematic?

Dark data causes silent but expensive problems - technically, operationally, and legally. These problems often arise where there is a lack of clear data quality standards, regular maintenance, and sustainable tracking.

On a practical level, dark data ties up valuable resources:

Storage space (whether on-premise or in the cloud) costs money and has to be managed.
Outdated or unstructured data reduces overall data quality and can distort analyses.
Time is lost as employees search through scattered, messy data instead of working productively.
Environmental impact: storing and cooling unused data produces an estimated 6.4 million tons of CO₂ annually - more than the emissions of 80 countries.

Dark data also poses compliance risks:

Personal or sensitive data left forgotten on a server
Security gaps in outdated systems

But how does so much data end up idle in the first place?

To shed light on this question, we invite you to join us on a tour of the h. Each room in this haunted house represents a common issue in dealing with unused or unusable, unstructured, or poor-quality data in companies.

Do you dare to come with us?

The Haunted House of Dark Data

1. Foyer of missing use cases

Why data without a purpose has no impact

We start in the foyer - the entrance hall - of dark data. The "foyer of missing use cases" symbolizes the fact that a lot of data is collected right from the start without a specific purpose. Too often, companies gather information “just in case,” without asking: What exactly is this for?

Without a clear use case or analytical question, data is quickly forgotten. As a result, large amounts of data exist, but no one feels responsible for managing it or even drawing insights from it.

This reveals a cultural problem. Everyone talks about data-driven decision-making, and managers constantly emphasize the high value of data. In one survey, nearly all managers surveyed rated data as extremely valuable for the company's success.

But the reality lags behind: on average, only around 32% of available company data is actually used for analysis (Seagate). That means more than two-thirds remains untouched due to a lack of ideas or capacity, even though its potential is widely praised. This gap between ambition and reality is the real ghost in the foyer.

The remedy starts at the "front door": with the strategy . Even before data is collected, companies should ensure that specialist departments work closely together to define clear use cases.

Every type of data collected should have an intended use or at least be evaluated early on with one key question in mind: "Can we create added value with this data?

Training can also help build data literacy across the company so employees learn how and for what purpose they can use data instead of letting it sit unused.

Watch our free webinar on data literacy

And last but not least, it's important to track value: For which decisions or processes has the existing data been used recently? Where are the gaps? This transforms the foyer from a transit warehouse for aimless data collection into a reception hall with a plan - data comes in because it is needed and used.

2. The storage cellar of archiving

Legacy data, compliance, and the cost of poor archiving

We continue down into the storage cellar of archiving. This is where countless old backups, archives, and data copies pile up like dusty file folders. Many companies regularly archive data, often for compliance reasons or to ease the load on primary systems. But once stored, these archives are rarely revisited. The cellar represents the weight of the past:

Legacy databases
Tape drives
Email archives
Long-forgotten decommissioned legacy systems

The risk: if archived data is not properly documented, eventually no one will know what it contains. At best, it "only" takes up space and costs money to store. At worst, it contains sensitive information that could resurface in a data breach. For example, a forgotten backup may contain personal customer data - a nightmare in the context of GDPR and data protection if that data cannot be deleted upon request.

In addition, storing data for too long creates data debt: systems must be kept running simply because they hold data that might still be needed. This leads to time-consuming processing and increased administrative costs.

The challenge in the storage cellar is also the last step in the data lifecycle, but it should be considered from the beginning: When is data finally deleted? Companies need clear retention periods and deletion policies. Not everything has to be kept forever.

Data archived for legal reasons (such as accounting documents) should be systematically deleted once the deadline expires to free up space. Modern archiving tools with indexing make it easier to stay on top of what’s stored and retrieve relevant records when needed. The cellar does not have to remain a dark dungeon - with good care, it becomes an organized archive, holding only what’s truly useful.

3. Room of future application

When the use case is always "tomorrow" - but never today

A lot of data is stored with the vague hope that it might be useful someday, à la "The crystal ball doesn't show anything concrete yet, but let's wait and see!".

Data records with the label "maybe later" are piled up in the room of the future application. Companies store huge amounts of log files, sensor, and tracking data in the hope of one day being able to use them for advanced analytics or machine learning. Thanks to ever cheaper storage and the promise of big data, it seemed to make sense - for a while - to dump everything into a data lake, just in case.

The problem: without a concrete plan and current use case, this data quickly turns into dead weight. In many cases, it's never touched again, and the data lake becomes a data swamp. As mentioned earlier, most companies don’t even use the majority of the data they collect—often not even a fraction. Why? Missing metadata, unknown formats, or simply shifting priorities.

Data that sits unused for years loses context and relevance, until no one remembers why it was collected in the first place. The room of future application represents postponed value creation: the data may be valuable, but the value never materializes because its purpose is always pushed to “later.”

The solution? Define use cases early and align them with your broader data strategy. Ideally, data should be collected to answer specific business questions from the start.
Of course, data can be stored for exploratively purposes - but then it needs regular review: Have we gained any insights from this data in the meantime? If not, it's time to decide whether to archive or delete it to conserve resources. Automation and AI can help uncover patterns or anomalies in the masses of raw data that may be useful after all. Sometimes, those ghost datasets really do turn out to be hidden treasures.

From the room of the future application, a staircase leads directly to the storage of the storages, with storage becoming cheaper, the temptation to keep everything “just in case” (IBM) is stronger than ever.

4. Storage of storages

Data sprawl, redundancies, and the loss of oversight

This room is packed floor to ceiling—shelves upon shelves of loosely organized files. This room represents the tendency in many companies to create more and more storage locations and copies, often without a clear overview. You'll find multiple near-identical filing cabinets here - all full of similar or duplicate files:

Network drives and SharePoints
Email attachments
Local storage
Data lakes and databases
Cloud backups

This flood of data means that no one knows exactly where any information is stored. The "storage of storages" symbolizes the uncontrolled growth of data (data sprawl) and the resulting redundancies.

These environments also generate large amounts of ROT data: redundant, obsolete, or trivial. Studies show that, in addition to dark data, a considerable proportion of stored information is simply clutter. An analysis by Veritas estimates that an average of 33% of company data is ROT, and a further 52% remains "dark".

This chaos not only drives up storage costs, it also makes it harder to find important data. When departments store their own copies—often because they don’t trust shared systems or don’t know they exist—data silos grow. This is usually driven by a desire for data sovereignty or fear of losing control. The result? Storage keeps expanding, silos deepen, and teams waste time working on the same data more than once.

Data lifecycle management is the key to creating order here. Companies need clear guidelines on where data is stored and when it should be archived or deleted. Centralized data management with shared, consolidated storage (instead of countless individual silos) helps to avoid duplicates.

Deduplication techniques (processes for identifying and removing duplicate data) and regular data inventories can be used to clean out the "storage of storages".

Ultimately, the aim is to shift from "storage is cheap, let's keep everything" to "(data) quality over quantity". Less data, better managed, means lower costs and higher efficiency.

5. Inaccessible data attic

Silos, ownership gaps, and lost information

In the attic of dark data, we find information that’s technically stored—but practically unreachable. Like a real attic without a key, everyone knows something’s up there, but no one’s sure what it is, who owns it, or how to access it.

Although this data is available, it's practically useless: it's poorly documented, technically isolated, or simply forgotten. Often, it sits in data silos - isolated or even encrypted data repositories owned by individual departments, or orphaned legacy systems that are not integrated into the company's overarching data architecture.

In addition to the technical isolation, there's often no one in the company who is familiar with this data or feels responsible for it. This lack of ownership in turn means that no one dares to clean up, integrate, or delete the records. The result? Unnecessary storage costs and, potentially, the loss of valuable information (like historical customer data).

The consequences can be seen in day-to-day work. Employees can't find the information they need and waste time searching in the data attic. One analysis found that knowledge workers spend an average of 1.8 hours per day searching for information they can't find. Data experts waste an average of 30% of their working time on finding and processing distributed data sets(marketlogic).

This inefficiency adds up. When data can’t be found, work gets duplicated, productivity drops, and frustration grows. A company's data discovery problem therefore generates real costs and slows down data-driven decisions.

To clear out the inaccessible data attic, companies are turning to data catalogs and metadata management. Central directories and consistent keywording (tags) make it easier to find distributed databases. However, without clearly defined responsibilities and active governance, many of these tools remain ineffective. A well-thought-out data governance strategy defines ownership) and access.

This turns the gloomy attic into a well-lit storage room where nothing valuable gets lost.

6. Niche of non-usability

Incorrect, unstructured, and incomprehensible data

In a shadowy corner of the haunted house lie piles of data that are simply useless - like grandpa's old record collection when you no longer have a record player. This is where all the data ends up that nobody can do anything with. Some data sets are incomplete, incorrect or untrustworthy, so analysts prefer to ignore them. Others are in exotic formats or systems that current tools cannot access. And some simply lack documentation - no one knows what table X means or which unit Y was measured in.

Without context, data quickly becomes data garbage.

Low-quality data leads to incorrect conclusions when used, or it costs time when it's found to be unusable during cleanup. Many companies have huge piles of such unstructured or unclassified information.

Low-quality data leads to wrong conclusions when used, or wastes time when it's flagged as unusable during cleanup. Many companies are sitting on large volumes of unstructured or unlabeled information.

This niche in the haunted house reveals a core problem of data quality: data can only be used to create value if it complies with current data quality standards - i.e. if it is accurate, complete, and comprehensible. Anything else will sooner or later become dark data.

To close the niche of non-usability, companies need to invest in data quality management. This includes processes such as data cleansing, standardization, and validation.

In addition, every important data set should be provided with the necessary metadata (What does the variable represent? How was it collected? How reliable is it?). This is where established data governance pays off again: for example, by appointing data stewards who are responsible for maintaining data quality in their specialist areas. With regular quality assurance, even dark corners of the data landscape can be illuminated, making previously unusable data valuable again through documentation and improvement.

Sometimes, however, the cost of technically processing the data outweighs its actual value. And that’s when you know you’ve entered the…

7. Chamber of costs and technological effort

What bad data really costs - in euros, effort, and CO₂

This is where the true cost of dark data becomes visible. Every stored file, every terabyte of unused data comes with both direct and hidden expenses:

Cloud storage fees
Electricity for servers
Cooling in data centers
Backup infrastructure
Administration and processing costs

From a global perspective, these costs are enormous; as early as 2016, Veritas warned that annual expenditure on storing dark and ROT data would rise to 2.2 trillion pounds by 2020 (capacitymedia). For individual companies, this means millions of pounds being spent on data storage without generating any added value.

On top of that comes the administrative effort: Old, rarely used data still requires maintenance. Legacy systems must continue to be operated or migrated, backups must be checked, and migration scripts written - all this for information that nobody actively uses.

This chamber of costs reminds us that every gigabyte of useless data ultimately burdens IT budgets and can slow down innovation projects (aka: opportunity costs).

And let's not forget the environmental cost: unused data still consumes electricity and resources. Cooling large data archives requires energy and contributes to CO₂ emissions. Storing dark data is estimated to release 6.4 million tons of CO₂ into the atmosphere every year. That is more than the emissions of 80 countries combined.

It is assumed that the global data volumes will quadruple to 175 zettabytes by 2025, which would mean around 91 zettabytes of dark data. For both cost and sustainability reasons, companies should take a hard look at what they’re storing - and why.

Overcoming these challenges calls for a strategic shift. Data governance plays a key role here: clear guidelines on who is allowed to keep which data and for how long, as well as regular checks, help to keep data volumes lean.

Companies should also track key metrics such as cost per data unit to get a feel for what their data (dis)use actually costs. Although cloud providers offer various storage options that reduce costs (e.g., tiered storage, where rarely used data is outsourced to cheaper, slower media), the most effective solution is still this: avoid collecting unnecessary data in the first place.

Conclusion and outlook

Our tour through the haunted house of dark data has highlighted the problem areas of unused data in companies: from inaccessible attics and redundant storage to dusty archives, forgotten use cases, and hidden cost traps. The good news? It is possible to get rid of these spooky creatures.

What can companies do to avoid or tame dark data?

Some key approaches:

Establish holistic data governance: A company-wide framework of guidelines and responsibilities ensures that data is managed sensibly throughout its entire lifecycle. This includes clear responsibilities (data owners, data stewards) and rules for collecting, storing, using, and deleting data (data contracts).
Use metadata management and data catalogs: Metadata and catalog tools help you keep track of what data you have, where it lives, and what it’s useful for. Think of it as labeling each “data box,” so nothing valuable gets lost in the basement.
Automated classification & AI support: Modern tools use AI/machine learning to automatically classify data, detect duplicates, and track down sensitive content. Such technologies can identify dark data (e.g., files that have not been opened for years) and make recommendations as to what can be deleted or archived. Machine learning can also help to find hidden patterns in unused data and make it usable.
Ensure data quality: Regular data quality audits and data quality tools help prevent data becoming unusable in the first place. Standardized formats, validity checks, and employee training in data handling (data literacy) increase the trustworthiness of the data.
Define clear use cases and measure added value: Data collection should start with a clear goal—"we’re collecting this to achieve X." Regularly check whether your data is actually being used to support decisions. If not, it may be time to archive or delete it and focus resources where they add value.

In conclusion, dark data isn't an inevitable fate. With the right strategy and modern technologies, companies can make their "dark data" visible - or consistently delete what's no longer needed. This improves data quality, lowers costs, and provides more room for innovation.

A well-illuminated data warehouse is the foundation for making the most of your information assets and enabling truly data-driven work. The ghosts of the past are banished through knowledge, structure, and governance - and the haunted house of dark data becomes a future-ready data home.

How FELD M helps you avoid dark data

Poorly maintained data isn’t just a risk, it’s a missed opportunity. At FELD M, we help you bring structure to dark data and turn it into a valuable asset. With our expertise in data governance, data architecture and data quality, we ensure that your data is accessible, reliable and optimally utilized - for well-founded decisions and a future-proof data strategy.

Find out more about our services:

Data governance & strategy

Data engineering & architecture

Data quality & analytics

You may also be interested in Guide to mobile app tracking: Everything you need to know for a solid app tracking setup Read More

Back to Overview