Explaining the path to Data Governance
- Over the past several years I've been involved in many data governance , MDM, Data Quality projects at the leadership level. Particularly in the last 18 months I have been for the most part right into the middle of troubled projects helping to correct the course.
Invariably the impetus for these projects was being driven by some type of tool purchase from one of the major vendors. Usually much time has been spent discussing infrastructure, servers, platforms and tool implementation of POC’s, for the most part a bunch of techno crap. In every case senior management hadalready recognized the blatantly obvious data quality issues they had but lacking a methodology or an approach they were driven to rely on the selection of the tool and the consultimg nature of the vendors to kick start the process.
I honestly believe after much experience that in the end the technology doesn’t matter pick whichever one you want.
The emphasis in order to achieve any success whatsoever, has to be completely business driven from the top down in concert with some level of technical cooperation from your existing staff, using your existing tools and not necessarily implementation of any new tool.
Let’s examine the typical layers of data governance.
As you can see the top of the pyramid is the data governance organization, people in newly assigned roles with expected deliverables and in the beginning no way of accomplishing them.
Next is the middle of the pyramid, which is a master data management application however it is also new and in the beginning simply a technical infrastructure and data design structure in essence an empty vessel with no understanding of the context and content of your business specifics. This is where you apply precision engineering to the concepts from above.
The bottom or infrastructure in essence, the base of your pyramid is a set of data quality capabilities most likely many of them already existing in your organization but with singular applications or one off purposes not cohesively cleansing standardizing profiling or correcting data, almost certainly organized by application and not by common enterprise definitions.
This is where I find most organizations literally simultaneously trying to build the top of the pyramid, the bottom of the pyramid and the middle of the pyramid with no focus or common contextual structure to guide then " get the infrastructure and tools up" as if that were an actual business accomplishment.
This will take many months And great expense and will yield no measurable business benefit, while simultaneously creating many many risks.
I cannot stress enough that a business needs to find a way to construct the necessary local information application architecture to meld business objective with data profiling results through mapping to common business terms.
It could actually be described as a normal business intelligence project building a dimensional data mart integrating cool family results metadata and business dimensions. See Gartner article.
For a different approach, What is needed at the top level is actually already there, it consist of recording your business metrics, the measurements by which you drive your business. These metrics need to be captured, written down and decomposed or simply broken the list of attributes that are used to drive your business.
And then a simple exercise of organizing the attributes into categories (Product, Geography , Finance etc…) and cross referencing them will give you a business matrix. The business matrix will give you the dimensional structure or relationship of metrics to intersecting attributes The business matrix can then be mapped to existing business processes and serve as the blueprint to drive from the top, the business level into the data governance level through the master data management level and directly into the data quality level. This simply brings focus to the various tools application and multi dimensional entity modeling efforts which in most cases are going on independently.
To simply stop and pick a key metric perhaps revenue and associates the attributes that are required to analyze it and understand it by defining common terms for those attributes, this is the "business glossary" and record them. Even without a preselected metadata tool or data dictionary is a major step forward
Next we will skip the middle and go to the bottom foundation of the pyramid. What is needed here is to take a tactical approach of focusing your data profiling and data matching applications and/or your existing staff at building what is known as a meta-data mart. You'll notice at the bottom of the pyramid in addition to the normal data quality processes I mention record linkage, this is critical in that it isn't just about using fuzzy matching to catch duplicate customers it's really about understanding the logic methodology for linking records from multiple systems which is also explained in detail tailed at my blog and many other sites
There are numerous websites containing specifics on applying these techniques, without purchasing vendor products, in a consistent way and in some cases government standards for measurement for consistent results
There are readily available references for building a small analytical data mart that will bring together the profiling and domain information for analyzing a few key selected metrics in their attributes for the initial iteration of your data governance process.
Next will discuss the middle, the master data management section which in essence is the point will you bring the people, processes and their defined metrics and attributes designed in the data governance section, the top and connect them with precision living in the master data management model via the data quality processes bottom.
Keeping in mind the list of metrics defined by the business at the top of the pyramid will form the basis for the master data management logical model. At this point if you’ve done your data profiling correctly and identified your list of common values for each business term then it becomes a mapping exercise of business terms to source column names to create a nexus of business terms to individual source data fields.
By defining the focus at the top of specific business metrics organizing them into metrics and their associated attributes for analysis building them into a model, even at the tactical level and accomplishing the appropriate data quality process standardizing, profiling, and record linkage you will begin the iterative process is required for a successful implementation.
Notice I have not discussed any individual tools because I do not feel they are of paramount importance, nor are any standard models the issue is to engage senior management middle-management and management in a circular iterative process of slowly constructing a data governance model that actually fits the needs of your business model.
It will be relatively straightforward to pick the appropriate tools g once all levels involved in the process can be a part of the decision-making in a valuation basing the decision on facts rather than conjecture or pre-conceived simplistic pictures from the sales organization.
If you’d like more detail please see my blog for many specific articles and techniques regarding this approach.
Ira Warren Whiteside