Some thoughts on Data Quality...

Some thoughts on Data Quality...

Data quality, the bane of every data analyst's existence. It's like trying to make a gourmet meal with moldy bread and spoiled milk. No matter how skilled you are, the end result is going to be less than desirable. Allow me to share with you some of the pitfalls of data quality that I have encountered on my journey to wrangle data into submission.

First and foremost, there is the issue of "garbage in, garbage out." You know, when the data you're working with is so flawed, that no matter how much you try to clean it, it's still going to be a mess. It's like trying to make a five-star meal with food from a gas station convenience store. Sure, you can try to make it work, but at the end of the day, it's still going to be questionable at best.

Then there's the issue of "data duplication." You spend hours trying to merge multiple datasets, only to realize that there's more duplicate data than there is unique data. It's like trying to put together a puzzle with 99% of the pieces being the exact same. Sure, you can try to make it work, but at the end of the day, it's still going to be a mess.

And let's not forget about the issue of "data formatting." You know, when the data is there, but it's in a format that's completely unreadable. It's like trying to read a novel written in Wingdings. Sure, you can try to make sense of it, but at the end of the day, it's still going to be gibberish.

But perhaps the biggest pitfall of data quality is the "confirmation bias." You know, when you're so convinced that your data is correct, that you can't see its flaws. It's like trying to prove that the earth is flat, no matter how much evidence is presented to the contrary.

In conclusion, data quality may seem like a small issue, but it can quickly spiral out of control and turn into a nightmare. From "garbage in, garbage out" to "confirmation bias", it's important to approach your data with a healthy dose of skepticism and a willingness to admit when things aren't working out. And remember, just because you have data, doesn't mean it's good data.


#data #dataquality #analytics

Biggest bane of my life with data has to be address data. No to data sets ever seem to have addresses in the format, and thats without mixing international data

To view or add a comment, sign in

More articles by Alex Wheatley

Others also viewed

Explore content categories