Data preparation – the key to managing dark data

Data preparation – the key to managing dark data

Sooner or later, data preparation becomes a top priority for all organisations aiming to make the most of their data. Having a strong focus on identifying, analysing and proactively acting on all data within the organisation will make the difference between laggards and leaders.

This is where the quality of data will exceed in importance and preparing data for this is how businesses will stay ahead.

Data preparation requires considerable time and effort, and some business users, data professionals and data scientists may still see data preparation to some extent as wasted time. The truth is that it’s hard to get value from your data. However, the real value coming from data insights can only be achieved by understanding them through data preparation, especially when data comes from multiple, unrelated data sources.

Businesses need to keep in mind that data can be generated in multiple ways, even during the normal course of doing business daily. Because of this, organisations must ensure they get hold of all their data sources and have clear visibility over the way it was obtained.

Dark data and its role in data preparation

With so many data breaches happening more often, consumers tend to worry more about having their personal details tracked and stored. This is why dark data starts becoming a worry for most businesses.

Gartner defines dark data as the information assets an organisation collects, processes and stores during day-to-day business activities, but which aren’t used or deleted. With time, holding onto it can become a problem.

Firstly, storing, securing and managing large amounts of data can lead to excessive costs. What’s worse, dark data may contain information that the company was unaware of or was forgotten. For example, a company’s dark data can contain personally identifiable information (PII) about customers or patients which was supposed to be deleted. If the employees were to forget about it or did not even know it was there, the company could be liable for non-compliance with privacy regulations.

So, how do you avoid collecting data you don’t want, will never use and could someday be penalised for? There is no magic solution, but data preparation is a good time to determine which data needs to be tracked and stop tracking the rest.

The solution

It’s not necessary to spend big money on specialised tools for data preparation. This is where query and data preparation tools are key to clean up your data, make sense of it and prepare it for downstream use in visualisation tools, dashboards, and data science.

Data scientists will have more time to focus on higher-value predictive analytics while simple-to-use data preparation tools can facilitate the process and help bring self-service data preparation into business and IT to broaden data empowerment.

To view or add a comment, sign in

More articles by John Pocknell

Others also viewed

Explore content categories