Data Vault - The basics

Madani BASHA

Published Sep 12, 2021

Please refer to the earlier article, as this article is a continuation of that. https://www.garudax.id/pulse/data-vault-underlying-principle-madani-basha/

We had spelt out that the data vault design is underpinned by a fundamental principle, viz., "Separate out stable things from things that are less stable".

We will see how the trio of Hub-Link-Satellite constructs are in accord with the above principle. Let us use the following example to illustrate. The diagram below represents a design following the "Normalised Design Technique". Let us assume that there is just one source system. We will address the more realistic scenario of multiple source systems in the follow-on article/s. For now, let us keep matters simple.

The model shows that there are CUSTOMER-s and a CUSTOMER must belong to a CUSTOMER_GROUP.

The vault design would be as follows:

Recommended by LinkedIn

HOOK vs Data Vault: Agility

Andrew Foad 2 years ago

Keep Exchangeable Data Simple

Matt Lightbourn 3 years ago

HOOK vs Data Vault: Willibald Part 4

Andrew Foad 2 years ago

Key points to note are:

The identifier (i.e., the PK) of the CUSTOMER entity in the normalised design is now in the H_CUSTOMER hub construct. And it has none of the attributes of the CUSTOMER. This is normal because the vault design technique prescribes that a hub construct must contain only the business key and nothing else.
All the non-relationship attributes of the CUSTOMER are now in the S_CUSTOMER satellite construct. Note that the S_CUSTOMER has the customer_ts (ts = timestamp) in its identifier (i.e., the PK), which is to support the change tracking (needed in any data warehouse). The vault design technique prescribes that a satellite construct must track changes over time.
The relationship between CUSTOMER and CUSTOMER_GROUP has been separated out into the L_CUSTOMER_CUSTOMER_GROUP link construct. In the follow on article/s, we will explore the properties of relationship. For now, note that the vault design technique prescribes that a link construct must not contain relationship properties. Instead they must be separated out into a satellite construct of the link.
The CUSTOMER_GROUP entity of the normalised design has been cast into the H_CUSTOMER_GROUP hub and the S_CUSTOMER_GROUP satellite constructs. As with the S_CUSTOMER, S_CUSTOMER_GROUP also has customer_group_ts (ts = timestamp) in the identifier (i.e., the PK).
The relationship between CUSTOMER and CUSTOMER_GROUP of the normalised design is reflected in the vault design as the L_CUSTOMER_CUSTOMER_GROUP link table.

As can be seen the vault design technique has indeed applied the principle of "Separate things that are stable from things that are less stable", as can be contrasted with the normalised design. This has resulted in 6 objects in the vault design as opposed to just 2 in the normalised design. Well .. nothing comes free. The increased number of entities (and the SQL joins necessary to corral the data) are the price to pay for gaining the agility.

The data vault design as shown in this article is intentionally different from what one would see in an implementation of a data vault design. The illustration in this article has intentionally avoided the surrogate key for the entities. The surrogate keys are a necessity for the implementation, but not needed to understand the data vault design per se. We will address and explain the need for surrogate keys and their use in the follow-on article/s.

For now the intent is to emphasise the fact that

there is a fundamental principle (i.e., "Separate things that are stable from things that are less table").
the Hub-Link-Satellite trio of constructs - being the signature characteristic of any data vault design - is the result of adherence to this principle.

Watch the space for further articles on Data Vault.

To view or add a comment, sign in

Data Vault - The basics

Madani BASHA

Recommended by LinkedIn

More articles by Madani BASHA

Others also viewed

Data Sharing Case Study on Direct Share & CURRENT_ROLE

The HOOK Metamodel

HOOK vs Data Vault: Willibald Part 2

The Great Big Data Vault Lie

HOOK vs. Data Vault : Pre-joins for Business Keys

HOOK vs Data Vault: Willibald Part 3

Applications Come and Go, Data Stays

Control complex data pipelines with a schedule is impossible - so what is the alternative?

Time Travelers Wanted: How SCD Type 2 Unlocks Historical Data with Snowflake and sqlDBM

Data Flow Diagram (DFD): Brief Tutorial

Explore content categories

Recommended by LinkedIn

More articles by Madani BASHA

Excel - Bane to Boon? Do colouring!

Data Vault - The underlying principle

Universal Data Architecture

Data Lineage challenge: A holistic solution

Data Design / Model for Hierarchies

Customer or Party?

Business Data Model or Canonical Data Model?

Surrogate Identifier: Must have a corresponding natural identifier

There is no identifier for this data!

Others also viewed

Data Sharing Case Study on Direct Share & CURRENT_ROLE

The HOOK Metamodel

HOOK vs Data Vault: Willibald Part 2

The Great Big Data Vault Lie

HOOK vs. Data Vault : Pre-joins for Business Keys

HOOK vs Data Vault: Willibald Part 3

Applications Come and Go, Data Stays

Control complex data pipelines with a schedule is impossible - so what is the alternative?

Time Travelers Wanted: How SCD Type 2 Unlocks Historical Data with Snowflake and sqlDBM

Data Flow Diagram (DFD): Brief Tutorial

Explore content categories