The surprisingly difficult challenge of evaluating if data is garbage or gold

Samuel Pritchett

Published Apr 29, 2019

Data quality feels like a topic that should be easy. Good and bad, right? It’s that simple. A binary choice that connects to core of human decision making and neatly ties to the ones and zeros we all love. The data in MY system is good and the data in SOMEONE ELSES system is bad.

But if pressed, can you really describe data quality? The Six Primary Dimensions for Data Quality Assessment by the Data Management Association (DAMA) does an excellent job of categorizing the attributes around data quality. Here’s how they break down the problem with very brief definitions:

Complete: Absence of blank (NULL) values.
Unique: Things are measured only once.
Timely: Represents reality from the required point in time.
Valid: Conforms to syntax and data types
Accurate: Correctly describes the real world
Consistent: Absence of difference on comparison

These are excellent. They bring clarity and organization. Some systems even calculate these elements as a part of summary statistics like the pg_stats table in PostgreSQL.

But there are business questions left unanswered. Have you heard these before?

“I can’t get at that data”
“It’s great, but I have to do a lot of work with it before I can do anything”
“Yeah, but have you met the people who own that data? It might be ok now…”

Try adding three categories to the DAMA structure:

Accessible: Available for access
Usable: Understandable, simple, relevant, and in the way you need it
Confident: Reputation of the data and how it is managed

With this full set of data characteristics, you may find yourself looking at deep existential questions you have never faced. If a data warehouse is inaccessible, is it really warehouse? If people are only using a dashboard to export data to Excel, is it really a dashboard? Do you really trust the people, process, and technology providing that data?

To view or add a comment, sign in

The surprisingly difficult challenge of evaluating if data is garbage or gold

Samuel Pritchett

More articles by Samuel Pritchett

Others also viewed

First Commandment: Never Mask Bad Data with Logic

Data Analytics = Mix it first!

Most Data Problems Aren’t About Data

The curse of Bad Data

Using previous H&S data to predict future incidents

5 mistakes to avoid when dealing with big data

WHY CLEAN YOUR DATA???

Data! And what now?

Data Quality: The Art of Picking the Right Tool and Not Just Any Hammer

Information vs Data

Explore content categories

More articles by Samuel Pritchett

The Player/Coach Ratio in Tech Is Broken, and the World of Tech Makes it Worse

Fun Facts

React with Mindfulness

The Data Uncanny Valley

How do you plan on flying that plane?

A Diatribe on Report-Based Decision Making