Data Vault - The basics

Data Vault - The basics

Please refer to the earlier article, as this article is a continuation of that. https://www.garudax.id/pulse/data-vault-underlying-principle-madani-basha/

We had spelt out that the data vault design is underpinned by a fundamental principle, viz., "Separate out stable things from things that are less stable".

We will see how the trio of Hub-Link-Satellite constructs are in accord with the above principle. Let us use the following example to illustrate. The diagram below represents a design following the "Normalised Design Technique". Let us assume that there is just one source system. We will address the more realistic scenario of multiple source systems in the follow-on article/s. For now, let us keep matters simple.

No alt text provided for this image

The model shows that there are CUSTOMER-s and a CUSTOMER must belong to a CUSTOMER_GROUP.

The vault design would be as follows:

No alt text provided for this image

Key points to note are:

  1. The identifier (i.e., the PK) of the CUSTOMER entity in the normalised design is now in the H_CUSTOMER hub construct. And it has none of the attributes of the CUSTOMER. This is normal because the vault design technique prescribes that a hub construct must contain only the business key and nothing else.
  2. All the non-relationship attributes of the CUSTOMER are now in the S_CUSTOMER satellite construct. Note that the S_CUSTOMER has the customer_ts (ts = timestamp) in its identifier (i.e., the PK), which is to support the change tracking (needed in any data warehouse). The vault design technique prescribes that a satellite construct must track changes over time.
  3. The relationship between CUSTOMER and CUSTOMER_GROUP has been separated out into the L_CUSTOMER_CUSTOMER_GROUP link construct. In the follow on article/s, we will explore the properties of relationship. For now, note that the vault design technique prescribes that a link construct must not contain relationship properties. Instead they must be separated out into a satellite construct of the link.
  4. The CUSTOMER_GROUP entity of the normalised design has been cast into the H_CUSTOMER_GROUP hub and the S_CUSTOMER_GROUP satellite constructs. As with the S_CUSTOMER, S_CUSTOMER_GROUP also has customer_group_ts (ts = timestamp) in the identifier (i.e., the PK).
  5. The relationship between CUSTOMER and CUSTOMER_GROUP of the normalised design is reflected in the vault design as the L_CUSTOMER_CUSTOMER_GROUP link table.

As can be seen the vault design technique has indeed applied the principle of "Separate things that are stable from things that are less stable", as can be contrasted with the normalised design. This has resulted in 6 objects in the vault design as opposed to just 2 in the normalised design. Well .. nothing comes free. The increased number of entities (and the SQL joins necessary to corral the data) are the price to pay for gaining the agility.

The data vault design as shown in this article is intentionally different from what one would see in an implementation of a data vault design. The illustration in this article has intentionally avoided the surrogate key for the entities. The surrogate keys are a necessity for the implementation, but not needed to understand the data vault design per se. We will address and explain the need for surrogate keys and their use in the follow-on article/s.

For now the intent is to emphasise the fact that

  1. there is a fundamental principle (i.e., "Separate things that are stable from things that are less table").
  2. the Hub-Link-Satellite trio of constructs - being the signature characteristic of any data vault design - is the result of adherence to this principle.

Watch the space for further articles on Data Vault.

To view or add a comment, sign in

More articles by Madani BASHA

  • Excel - Bane to Boon? Do colouring!

    Anyone out there who does not know Excel or Spreadsheets? I will seriously be worried about that person. 😊…

    2 Comments
  • Data Vault - The underlying principle

    What is Data Vault? The best source for this is the inventor himself, Dan Linstedt. "definition: the data vault is a…

  • Universal Data Architecture

    Data and everything about data are squarely on the centre stage of both business and technology platforms and forums…

    3 Comments
  • Data Lineage challenge: A holistic solution

    Data lineage has always been a requirement and a challenge. Nothing new but emphasis has surely grown.

    9 Comments
  • Data Design / Model for Hierarchies

    This is a follow on to my previous article. https://www.

  • Customer or Party?

    Traditionally, people interacting with an organisation were classified as employees, customers, suppliers etc. Most…

    2 Comments
  • Business Data Model or Canonical Data Model?

    Data modelling has been around for well over 4 decades. The seminal paper by Peter Chen in 1976 in the first volume of…

    2 Comments
  • Surrogate Identifier: Must have a corresponding natural identifier

    This is a follow on article to this one https://www.linkedin.

  • There is no identifier for this data!

    Occasionally one comes across people - typically from environments where data is managed using a file system as opposed…

    5 Comments

Others also viewed

Explore content categories