Data Mining & Easter Eggs

Data Mining & Easter Eggs

Q: Why do they call it data mining? Isn’t it knowledge mining?

A: No, it’s data mining. Knowledge is two steps up the added value chain. Once something becomes knowledge, you don’t really need data mining. So let’s first talk about the added value chain.

Data is less valuable than information. You can have bad data. It’s still data. You cleanse the data and organize it then you have information. Encrypted data is still data. It’s not information until you decrypt it. That’s the difference between data and information.

Good, clean, organized data = information. It’s not knowledge until it tells a story. That means a human can understand what it is. Knowledge is information in context. If I say “35 Units are in Department 7” and “27 Units are in Department 8” I have information. But it’s not knowledge until I have complete information. I don’t know the date. I don’t know how many departments there are.

Intelligence is actionable knowledge with appropriate qualifications. We could get into what make makes something actionable. But that has been done here.

All of these elevations in value come from human organizing data into information into knowledge into intelligence. But humans have a particular way of organizing and thinking about where to find intelligence. Think of how children search on an easter egg hunt. They don’t brute force search every square inch for things that look like eggs, they try to emulate the thinking of ‘Easter Bunnies’, the parents who hide them. Machines, on the other, search more exhaustively and find patterns that humans don’t consider relevant. A machine might notice for example that light colored eggs were hidden an average of 2 inches from tall vertical surfaces. Children would not notice, they just count the eggs. But machines can surface patterns in the DATA that create a different kind of information, knowledge and intelligence. For example, data mining an easter egg hunt might show based on proximity of egg colors and the color of occluding objects, that some of the people who hid the eggs might be colorblind. Their inability to distinguish certain colors made the hunt easier for non-colorblind children to find the eggs, and therefore the hunt rewarded faster running children without sophisticated hunting tactics. That is surfacing information in the data that humans would never have the inclination or patience to discover, which is the point of having machines do the mining rather than humans.

As soon as you have a machine-sensed pattern in the data, it changes how you organize the data into information. That is what adds value and it changes the way you think about analyzing easter egg hunts in the future.

To view or add a comment, sign in

More articles by Michael Bowen

  • Not Data-Centric? A Risk Scenario

    Is your business data-centric? At first glance you might think not, but you may be at risk if you are unable to inform…

  • Humans vs AI On My Mind

    Last week I was in Bogota again. It was less interesting than IAH.

    1 Comment
  • Adrian Cockcroft & The Pitbull

    A new and interesting video by Adrian Cockcroft at AWS validates a data architecture approach we have have been using…

  • Preserve Your Legacy

    If you are a gearhead like I am, there are few things that get you excited like the choppy rumble of of a hotrod. Of…

  • ETL vs ELT: A Philosophy

    Someone asked me which way I would run my data workflows generally through ETL or ELT. It’s mostly an economic question.

  • Upsert into Amazon Redshift using AWS Glue and SneaQL | Amazon Web Services

    AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to…

  • Data Lakes: A Deeper Dive

    The benefits of a data lake are many. In one way, you can say it is one of the lynchpins of a data-centric cloud…

  • Data Lakes: A Fast Shallow Dive

    Lets start with a fast shallow dive. Structured data lakes are one of the things we’re all about at Full 360.

    2 Comments
  • What is the role of a Structured Data Lake in DW?

    The Full 360 Approach Our approach is a little different than generic data lakes. We build structured data lakes.

    2 Comments
  • Puzzles, Mysteries, Rabbit Holes & Risk

    In my world of problem solving there are four classes of work, puzzles, mysteries, rabbit holes and black holes…

Explore content categories