Second-Hand Data

Second-Hand Data

I have been involved in establishing Data Architecture principles in several companies. Two principles that often appear on the final list are:

  • Use Golden Sources (or Use Trusted Sources, or Use Authoritative Sources, depending on the terminology used in the organisation)
  • Don’t Forward Data

These are really two sides of the same coin. If an application is not an authoritative source of a specific type (and scope) of data then the first principle should rule out any other application using it as the source of that data. Strictly speaking the second principle is redundant. However, given the extent to which data flows uncontrollably around the legacy applications of many companies, I argue that it is important to highlight the responsibility of the “potential publisher” as well as that of the consumer.

Why is it important?

Most people will be familiar with the children’s game “Chinese Whispers” (or “Broken Telephone” to give it a more politically correct name). This is where a message is passed verbally, and quietly, along a chain of children.  Although the objective is to pass the message to the end of the chain without it becoming garbled along the way, the outcome is usually different. Often amusingly so.

If your business application is at the end of a chain of applications that have passed data along from an authoritative source at the start of the chain, and if that data is in anyway garbled (transformed or filtered, whether intentionally or unintentionally) along the way, then the outcome is unlikely to be amusing.

Why would you do it?

Why would you do it? If you are the consumer, what is the attraction of taking the data from a source that is not authoritative – that cannot be trusted NOT to have changed the data in some way? If you are the publisher, why would you want to take on the responsibility of providing that data to others?

If everything was logical, you wouldn’t.

So why does it happen?

In an organisation that does not have or does not enforce Architecture Principles it would be unsurprising for developers of a business application that needs a new type of data to look to obtain it from applications they already work with. I worked in a major bank where both the Collateral Management and Client Valuations functions needed daily feeds of open transactions. Collateral Management were closely aligned with Risk, so sourced the data from the risk systems. Client Valuations were aligned with Finance and sourced the data from there. Significant issues ensued when differences in the data sets resulted in clients receiving collateral calls and valuation statements based on different populations of trades and different valuations of them.

Even if the organisation has a concept of Golden / Trusted Sources, application teams may find it more expedient to source the data from an application they already have data feeds in place with rather than working with the “new” source. Data Architects often have to fight the good fight to encourage application developers to “do it right” rather than “do it fast”. The authoritative sources can help by ensuring that their interfaces and feeds make it as easy as possible for the data to be consumed.

Another objection I have heard is “The quality of the data in [INSERT NAME OF AUTHORITATIVE SOURCE] is not good enough”. (Even if true…) It should be easy to convince those voicing that opinion that it is in everyone’s interest to work to improve the quality in the authoritative source rather than working around it.

It should be easy… but welcome to the world of the Data Architect.

Let me pose this question to you. Does a trusted source really exist? If so how to you quantify it? Is it the source that is most complete and doesn't yield many errors? Could it be a market data source? I believe so but those providers of data do make mistakes.

Keep fighting the good fight all you Data Architects and CDOs! Architectural principles are good but they need to be enforced. The majority of ‘data issues’ are not caused by the collection of bad data, the root cause is overwhelmingly poor solution design ie taking from a non trusted source, poor system controls ie no validation on input or basic controls like record counts and checksum, and poor change controls ie making changes without thought to the impact on downstream consumers. So listen to your Data Architects !

Thanks Colin, a very common pain point. The driver for not doing it right normally boils down to time and/or cost, both being business drivers. This is where the architecture principles must be engrained in both business and IT. Lost count over the years where IT have been blamed for historic data sourcing decisions when at the time the business were pushing for the cheapest / quickest route. Sometimes expedience can be justified (regulatory deadlines a classic example) but even then it should understood it's not a permanent fix and by not doing the right thing will cost more in the long term regardless.

There’s the funding point too. Sometimes it’s cheaper to get the data from a non-authoritative source. But remember, it’s not! It’s cheaper for you because: • you haven’t documented what data you’re consuming. • you haven’t verified the quality. • you haven’t even checked whether you’re allowed to use it or it’s ethical to do so. So of course it’s cheaper because you haven’t done half the things you’re supposed to do! Do it once and do it right.

To view or add a comment, sign in

More articles by Colin Gibson

  • Eminem, Talking Heads and Data Lineage

    A recent industry work group was discussing the topics of Data Provenance and Data Lineage… and getting very confused…

    26 Comments
  • The Chief Data Officer and the Football Manager

    The average tenure of a manager in the English Premier League is 2 – 3 years. The EDM Council’s 2020 Global Industry…

    12 Comments
  • Data Hierarchies - Approach With Care

    What is the item you find hardest to locate in a supermarket? You know where to find it in your local store, but what…

    6 Comments
  • Reference Data vs Master Data - What's In A Name?

    An organisation’s data can typically be placed into three broad categories: · Transaction Data: data describing…

    14 Comments
  • The Day of the GUIDs

    The European Union General Data Protection Regulation will encourage, even incentivise, the pseudonymisation of…

    2 Comments

Others also viewed

Explore content categories