HOOK and Reference Data

HOOK and Reference Data

In my last article I showed that with Data Vault, reference data is not such a simple concept to manage. In this follow up post, I will show how easily reference data is handled in HOOK.

HOOK vs. Data Vault

To remind you, I put together the following Data Vault model. I’m not suggesting this is the way it should be modelled, but it is a way it could be modelled.

No alt text provided for this image

The equivalent HOOK model would look like this:

No alt text provided for this image

You will notice that there is no equivalent currency Hook. The recommended approach is that all reference data be bundled together under a single, all-encompassing, reference Hook. How does this look like in the Bag views?

Bag Definitions

Each of the three Bags is defined as follows.

No alt text provided for this image

For the Transaction bag, there are a couple of things to note. There are two business terms at play, namely Account and Reference. These two terms MUST be defined in the business glossary. Account is used twice and, therefore, must be qualified for debit and credit accounts. Also, note that both hook keys use the same key set (10).

Currency is treated as reference data and is represented by the generic Reference business term. The hook key field is qualified to distinguish the type of reference data being referred to. Also, notice that currency has its own key set (123).

In the Exchange Rate bag currency is referenced twice and as such must be qualified. Notice that there are two qualifiers, the first to describe the key set the second to describe its role.

Reference Key Sets

For consistency ‘Reference’ is an implied Business Term. All reference type data sit underneath this umbrella term. To distinguish between different types of reference data we can simply attach a key set. Typically an organisation will use hundreds even thousands of reference tables. Each of them will require a key set to be defined in a similar way that business terms are defined, but they are not business terms themselves. You can think of them as ‘weak’ business terms.

Naming Conventions

The naming convention I’ve used for Hook Keys is consistent:

  • HK_<business term>{__Qualifier}

The HK_ prefix indicates it is a hook key, followed by the business term name and then zero, one or more qualifiers separated by double underscores ('__'). But I am wondering whether reference hook keys could be named differently:

  • RK_<key set>{__Qualifier}

The RK_ prefix indicates a reference (hook) key. So rather than:

  • HK_Ref__Currency__From

It could be:

  • RK_Currency__From

I’m open to alternative naming conventions, but there should be a level of consistency across all implementations. I hope to document these standards soon.

Closing Thoughts

I hope you’ll agree that this approach is easier than its Data Vault equivalent; there is no need to place reference data in a different type of table structure. Reference data is placed in a bag and treated like any other type of data.

The beauty of the HOOK approach is that even if we do get the model wrong, for example, we decide to promote a particular reference set to a core business concept, then we can do so without the need to reload or restructure the data. We simply drop the relevant bag views and re-create them.

Again, many thanks to the kind folk at ellie.ai for granting me access to their cloud-based business glossary and data modelling tool, which I’ve used to produce the diagrams in this article.



Thank you so much, I've been waiting for this.

Like
Reply

To view or add a comment, sign in

More articles by Andrew Foad

  • The Great Big Data Vault Lie 3: The Mystery of the Missing Metrics

    While I’m writing this article, I’m angry. Angry and frustrated.

    15 Comments
  • The Great Big Data Vault Lie 2

    Data Vault is scalable, or so we’re told by its creator, Dan Linstedt, in the book he co-authored with Michael…

    50 Comments
  • The Great Big Data Vault Lie

    I have recently been running half-day “Introduction to Hook” sessions for a client. When explaining that the Hook…

    54 Comments
  • The map is NOT the territory.

    I was bemoaning that ‘Nobody gets it!’ when discussing the recent feedback from my last article with my business…

    18 Comments
  • Square Pegs and Round Holes

    Why isn’t data modelling the No.1 go-to skill required by everybody in the data industry? I believe the answer has…

    28 Comments
  • "You have the right to remain silent..."

    There is a phrase in the 'Miranda' rights that is typically read when arresting suspects in the UK and Australia that I…

    2 Comments
  • Capability vs. Features

    As many of you know, I've been critical of the fact that a high percentage of data projects seem to fail, which was one…

    3 Comments
  • Is Data Modelling Dead?

    I’ve been using data modelling techniques for most of my 35-year career, and I consider it an essential skill for…

    48 Comments
  • Data Vault is Garbage (change my mind)

    If you are unfamiliar with the Hans Christian Andersen story of “The Emperor’s New Clothes”, ChatGPT summarises it…

    7 Comments
  • Introducing HOOK. Part 1 - Logical Architecture

    This article is the first in a short series on the Hook approach to Data Warehousing. As I said in my introduction…

Others also viewed

Explore content categories