What is Data Entropy?
Image courtesy of https://pxhere.com/en/photo/742228

What is Data Entropy?

There is a common meme that LinkedIn regulars will know well. It shows a series of pictures of Lego, one with lots of bricks all mixed up, another with the bricks separated out by colour, and perhaps more with the bricks assembled into shapes.

No alt text provided for this image

Captions under each image will read something like 'Data, Information, Knowledge, Wisdom', or 'Data, Sorted, Arranged, Explained'.

As parent of small kids, I can attest that taking a big tub of random Lego pieces and sorting them into groups of similar colour or type takes a lot of effort. I can also tell you that trying to build a large Lego model without doing this sorting first is going to take many times longer.

And yet, no matter how many times Lego gets organised to make it easier for building with, within a week or two it is all mixed up again. It's like Lego has its own kind of Entropy.

Entropy

Entropy in nature is the tendency of things to become disordered over time. Objects that are finely sculpted lose definition, things that are separate become mixed, things that are extreme become moderate.

Natural forces and processes, from the thermodynamic behaviour of gas molecules to weathering and erosion, cause ordered matter to become disordered over time.

https://pxhere.com/en/photo/659632

Hot objects donate heat to cooler things around them until all are the same temperature. Your latte doesn't go 'cold', and your frappe doesn't get 'warm', they both just align with room temperature. Scientific theory tells us that everything in the whole universe will eventually arrive at the same temperature.

https://pxhere.com/en/photo/158383

Tectonic forces push the earth upwards to create mountains. However, once the 'organising' tectonic forces subside, wind and water will eat away at that mountain, like a sandcastle on the beach, slowly turning rock to dust and eventually leaving the earth as flat as it was before the mountain first rose.

We can apply energy in a directed way to slow, or even reverse entropy in controlled circumstances, e.g. by constructing erosion barriers on shorelines, or by microwaving that latte. However, without regular housekeeping 'order' inevitably becomes 'disorder', 'sorted' becomes 'random', 'different' becomes 'same'. Entropy will ultimately have its way.

Data Entropy

Data in an organisation has it's own kind of entropy too. We might start out with highly planned and structured enterprise data systems, but gradually, little by little, disorder can creep in. In the cut and thrust of the working world, solutions often need to be tactical, rather than strategic.

  • Strategic actions are like oil tankers: slow to get going and hard to turn around, but capable of moving enormous loads over vast distances.
  • Tactics are like jet-skis: short-range, fast moving, and well suited to carrying a small load a short distance, albeit often leaving a messy wake.

The attraction of tactical action over strategic action to solve problems quickly is undeniable, with the longer term consequences often ignored. Examples of tactical behaviours that can lead to data disorder include: 

  • making copies of core reports or databases ('data silos'), in order to make changes or customisations without the management overhead and delay that changing the master copy would require;
  • re-creating common business logic in local data silos and reports rather than the central data model, resulting in multiple separate implementations of things should be standard;
  • creating on-going data solutions using data elements that need to be created and/or updated manually in every iteration, creating an on-going management overhead and dramatically increasing the likelihood of error;
  • failing to assess data quality, and/or to address data quality issues as they arise, meaning that over time DQ issues accumulate until people lose faith in the data;
  • failing to create and/or maintain documentation describing data systems and explaining key design features, meaning that new users have a very steep learning curve, and your systems risk becoming 'black boxes' that people are afraid to amend or enhance;
  • failing to have and/or apply consistent standards for naming, development, testing and delivery of data solutions, meaning users cannot easily understand all systems, systems cannot talk to one another easily, and ;
  • employing SaaS solutions that do not allow full access to the data created there, meaning important organisational data cannot be interrogated independently of the SaaS application, or integrated with other enterprise data.

Some of these behaviours happen covertly within the business, away from the gaze of IT or the data management team. This is usually inadvertent, but sometime deliberate. The business must be facilitated and enabled by IT and Data Management to do their work, and a sure sign that things are going wrong is when the business starts implementing solutions by themselves to avoid dealing with these functions.

Others behaviours may be facilitated by IT and Data Management teams, under varying levels of pressure from the business to address high-priority requirements so quickly that they decide or agree to cut-corners.

To mitigate the effects of 'data entropy', active management and governance is essential:

  • Organisations should establish and communicate clear codes of conduct and best practices to ensure staff can identify, and are aware of the impacts of, poor data management.
  • The business should be given access to appropriate tools, training and information to make effective use of data.
  • IT and Data Management must be given sufficient support and funding that lack of resources does not become a bottleneck, limiting business use of data and forcing business users to invent their own ways of doing things.
  • When urgent changes require 'cutting corners', the business must commit to implementing the changes properly immediately at the first opportunity after the emergency changes are applied.

Conclusion

For Lego, the natural force that tends towards disorder is children. All a parent can do is encourage their little ones to look after their toys and periodically help them tidy up and do some sorting. Hopefully we can prevent tears by ensuring precious pieces are not lost under the couch or in the belly of the vacuum cleaner.

Similarly, without on-going effort to manage data and enforce good practices, data entropy will gradually erode organisational data capability and capacity. However, if IT and business work together with common cause to 'apply energy' in the right way, data entropy can be minimised and reversed.

Questions on Data Warehousing, Data Integration, Data Quality, Business Intelligence, Data Management or Data Governance? Click Here to begin a conversation.

John Thompson is a Director with EY's Technology Consulting practice. His primary focus for many years has been the effective design, management and optimal utilisation of large analytic data systems.

This is great information to have when workshopping with educators about Computational Thinking and tactical data.

Like
Reply

Who doesn't love a good Lego analogy? John has a talent for explaining tough topics in easy language. Thanks John.

My favorite definition is "things tend towards chaos"

To view or add a comment, sign in

More articles by John Thompson

  • You Buy Your Freedom - AI and Regulation

    When I joined my old company, Client Solutions, it was part of a larger group of companies called the Horizon Group…

    3 Comments
  • Is the DMBoK Wrong on Data Governance?

    Data governance is a vital component of business data strategy, but confusion remains over who should actually ‘own’…

    4 Comments
  • What is a Data Fabric?

    Imagine a world where your data flows effortlessly across systems, unlocking powerful analytics and AI without the…

  • The Data Model is the API

    The age of big data created new opportunities for Enterprise Data, but also confusion. The bold promises of new…

  • If Cork won't buy AI, maybe AI could buy Cork?

    I was chatting with a colleague of mine recently who had encountered some scepticism when presenting about the benefits…

  • Data Centers and Sustainability

    There has been much negative talk recently in Ireland about data-centers and the resources they consume. The debate is…

    3 Comments
  • Where's the (Business) Logic?

    Business Logic must be applied to data in many situations in order to fulfil business requirements not present in the…

  • The Burger Architecture - A Real Data Stack?

    In a recent article I wrote about the 'medallion architecture', which various vendors describe as a series of 'layers'…

  • Is 'Medallion' a Data Architecture?

    'Medallion Architecture' is a phrase that had been used a lot over the last few years, and I have to admit its one that…

  • The Pocket Data Warehouse

    The 'pocket data warehouse' is a simple, but powerful idea that leverages clever data modelling techniques to provides…

    1 Comment

Others also viewed

Explore content categories