Data lineage
What is it?
Data lineage documents the journey that data takes through an organization's systems, showing how it flows between them and gets transformed for different uses along the way
Let’s understand more about data lineage through a simple example.
Think of a water distribution system – it has a centralized location from where the water points originates for each household. The map gives all the details of main water origin point, transfer locations, secondary storage points and water pumps, etc.
If someone wants a new connection, the map helps in giving a clear indication of a water point with required pressure. Now think of a scenario, if there are multiple origination points, storages and water pumps, it becomes complicated and the need for a map becomes even more critical.
This applies to the numerous data attributes that’s generated in an organisation. The ‘Data Lineage’ kind of works like a map to show where the data comes from (source), where is it flowing to (consumption points and storage) and what happens to it along the way (transformation).
There are 2 ways in which data lineage is mapped:
Business data lineage/functional data lineage - Journey of data without fine-grained technical details (high level)
Technical data lineage - Journey of data with fine grained technical details of application, table/schema details and transformation
Recommended by LinkedIn
Approach to data lineage
Steps involved in doing the lineage:
(Collab Why is data lineage important?
To maximise the value of any business an understanding of the market as well as the organization behind the business is vital. To either a proper understanding of the data that represents the business must be documented. Data lineage is essential to this documentation. Data lineage helps in understanding which data requires more focus than the rest. Since it highlights the journey that data takes as well , it helps in understanding what level of access must be given to what data)
Some of the benefits of data lineage are:
· Data triangulation – To understand what data to look at, we need to know where it comes from. This cannot be done without data lineage, this also leads to more accurate analytics.
· Robust data governance – Knowing the journey that data takes is a must have for any kind of auditing to be done. Data governance which is essentially an audit of data is much easier with better data lineage.
· Improved regulatory compliance – Since data lineage aids data audits, any regulatory requirements that need data to be met can easily be fulfilled.