Immutable Data Warehouse
Immutable Data Warehouse has come from immutable data. Data folks are aware of it but don’t use it more often.
Let’s decode it…..
‘Immutable means not changeable.’
Now, the question is how its relevant to us in Data World. We know when we are inserting new data in a database, at the same time we also have the option to change it by using update or delete commands which means data is mutable.
Mutable feature is a MUST TO HAVE in any database at a technology level, means, any databases or data store MUST allow to update or delete. One should or should not update or delete, can be a business requirement but cannot be technology limitation.
‘By default, Data Warehouse is an Immutable datastore.’
Immutable feature is not only about, not able to update or delete data, it’s about which data should be immutable, and which should be mutable. For this, one must understand the data.
Recommended by LinkedIn
Image: https://www.tibco.com/reference-center/what-is-immutable-data
For example, CUSTOMER dataset contains many types of data e.g., name, date of birth, parents, children, address, contact details, jobs, cars etc. In this dataset, you must have judged which information is immutable and which mutable.
In above example, customer details like name, date of birth, parents, children etc. are immutable details which can’t be and should not be changeable plus should not be delectable. This make immutable data to be preserved. This behavior must be captured in metadata and should be managed by Data Governance team.
Refer to the same above example, customer details like address, contact details, jobs, cars etc. is mutable information as one customer can change his/ her address or can change contract details etc. This behavior should also be captured in metadata and should be managed by Data Governance team.
Please note, immutable and mutable behavior in above example is required in OLTP (explain as a different topic) but when we come to Data Warehouse and Data Lake, immutable behavior is a given thing for Fact tables. But let me assure you, YES we need all kind of historical data but that’s very much depends on the nature of the business. For example, Insurance business doesn’t care how many times you have changes your address or your contact details, they are more interested to keep your historical data like your health, your family, you’re eating habits etc. which is relevant for them to run their business. So, in other words, in Insurance related Data Warehouse, we might not even store irrelevant data element.
Point here is, the architecture of any Data Warehouse or Data Marts is to keep historical transactional data which is residing in Fact tables whereas data in Dimensions can be mutable based on the type of Dimension.
Cheers.