HOOK and Reference Data
In my last article I showed that with Data Vault, reference data is not such a simple concept to manage. In this follow up post, I will show how easily reference data is handled in HOOK.
HOOK vs. Data Vault
To remind you, I put together the following Data Vault model. I’m not suggesting this is the way it should be modelled, but it is a way it could be modelled.
The equivalent HOOK model would look like this:
You will notice that there is no equivalent currency Hook. The recommended approach is that all reference data be bundled together under a single, all-encompassing, reference Hook. How does this look like in the Bag views?
Bag Definitions
Each of the three Bags is defined as follows.
For the Transaction bag, there are a couple of things to note. There are two business terms at play, namely Account and Reference. These two terms MUST be defined in the business glossary. Account is used twice and, therefore, must be qualified for debit and credit accounts. Also, note that both hook keys use the same key set (10).
Currency is treated as reference data and is represented by the generic Reference business term. The hook key field is qualified to distinguish the type of reference data being referred to. Also, notice that currency has its own key set (123).
In the Exchange Rate bag currency is referenced twice and as such must be qualified. Notice that there are two qualifiers, the first to describe the key set the second to describe its role.
Reference Key Sets
For consistency ‘Reference’ is an implied Business Term. All reference type data sit underneath this umbrella term. To distinguish between different types of reference data we can simply attach a key set. Typically an organisation will use hundreds even thousands of reference tables. Each of them will require a key set to be defined in a similar way that business terms are defined, but they are not business terms themselves. You can think of them as ‘weak’ business terms.
Recommended by LinkedIn
Naming Conventions
The naming convention I’ve used for Hook Keys is consistent:
The HK_ prefix indicates it is a hook key, followed by the business term name and then zero, one or more qualifiers separated by double underscores ('__'). But I am wondering whether reference hook keys could be named differently:
The RK_ prefix indicates a reference (hook) key. So rather than:
It could be:
I’m open to alternative naming conventions, but there should be a level of consistency across all implementations. I hope to document these standards soon.
Closing Thoughts
I hope you’ll agree that this approach is easier than its Data Vault equivalent; there is no need to place reference data in a different type of table structure. Reference data is placed in a bag and treated like any other type of data.
The beauty of the HOOK approach is that even if we do get the model wrong, for example, we decide to promote a particular reference set to a core business concept, then we can do so without the need to reload or restructure the data. We simply drop the relevant bag views and re-create them.
Again, many thanks to the kind folk at ellie.ai for granting me access to their cloud-based business glossary and data modelling tool, which I’ve used to produce the diagrams in this article.
Thank you so much, I've been waiting for this.