Everything simple is false. Everything complex is unusable
So said Ambroise-Paul-Toussaint-Jules Valéry. The British statistician George Box, who said "All models are wrong, some are useful", was making the same point.
This is the first of two posts setting out some concepts. These concepts are applicable to whatever business you are in and wherever you are in terms of your data governance maturity. Data governance maturity is something we'll dive into at a later date.
Whatever we do in data governance, it needs to be economic, efficient and effective.
At the earliest stages of this process, it's important to consider scope and agree a terms of reference. Not doing this will limit success. Try to avoid the common mistake of overlooking what you should be doing in the rush to get to how you should do it.
For almost all data governance initiatives, it's important to partition the work. Doing this will allow you to prioritise what work you do first as well as 'maximising the volume of work not done'. It will also help with stakeholder engagement and enable communication in terms of benefits and capabilities.
However you eventually do partition the work, you should keep the following model in mind.
Sensitive data is data which needs safeguarding as it will damage the interests of the business if it should fall into the wrong hands. It has associated costs, risks and a corresponding value.
Redundant data is data which has no further value to the business. It has associated costs and risks but no corresponding value.
Other data is data which doesn't have a particular risk attached to it. It has associated costs and a corresponding value.
An important characteristic of this model is that it focuses on what the data is not where it is or what format it is in. That's a theme we'll be revisiting regularly. Another characteristic of this model is that it is simple. We've put an abstraction layer between us and our data governance objectives. This makes it a lot easier to engage stakeholders.
In the next post, I'll build on what is set out here and provide a simple model comprising inputs, processes and outputs which will address pretty much all data governance workloads.
Talking points:
- If the model above isn't complete - what's missing and why does it matter?
- Do you see benefit in focusing on data based on what it is, not where it is? Conversely, do you see problems in focusing on where the data is stored rather than what the data is?
- How important is stakeholder engagement in your data governance project? Why?
- What does 'maximising the volume of work not done' mean to you?