Data Governance

When we hear the word governance, several scenarios dance in our minds. We think of government and politics, right vs wrong, conservative vs liberal, strict vs lenient. However with data governance we need to apply a different mind set. The act of governing data is to ensure that the origin, validity, completeness and quality of data are maintained to the highest standard. This is the metronome that is a constant. We need to build trust in our data and the act of governing data is to build road strips and guard rails around our data sets such that we do not drift from our established and published sources of truth.

Its not uncommon today for data to be centralized, and in many organizations, it takes the form of data lakes. Data lakes are conceptually a prudent strategy to minimize duplication of data and to be a central distribution point for data consumers. Several distribution mechanisms can then built from the data lake as a source of data. This effort can save time, money, hardware, software and human resources and most importantly perform the role of a single source of truth.

However, without good governance, data lakes can turn into dumping grounds for toxic data sets. Just as our rivers, if neglected and not governed and monitored, can get polluted pretty quickly, data lakes need to have strong oversight and governance on a regular basis. I would go further and state that every established data set must be accompanied by and independent governing body.

Lets start with a strong metadata model. Any and every data set should implicitly have data about the data. Every entity and attribute should be documented and kept up to date at regular intervals, as data is dynamic, constantly changing and evolving. Documentation must include the source or origin of the data, the frequency of intake, the definition of each and every attribute, its datatype and frequency of update.

The next step is to clearly document the mechanism for data intake and distribution. Let's start with intake. Data (and information) should and must be sourced from established and authorized data sources. The data set must have a credible owner who is an expert in the data set and the information is relevant to the persons area of expertise. Processes must be established when data ownership changes, workflows are put in place for an orderly transition of the data set to the new owner. Every data set must follow an established lifecycle from creation to destruction which must also adhere to corporate policies of data retention and eventual offloading.

Next we move to data distribution. The quick and painless way is to distribute data via native database access. While this is fast and efficient, it is the least desirable. The data layer must be abstracted away from the end user to shield the data provider from the ability to update and upgrade the data model. In addition, providing database access introduces control issues over service id's and access rights. A preferred method, in my opinion, is to build a strong and flexible API layer that can provide programmable access to the data set with third and fourth generation languages. The API layer can be an http call that returns html, JSON, XML etc. for programmatic access. A separate API must be built for the metadata to accompany the API layer. The combination of the meta API and API layer forms a well established and complete method to distribute data for programmatic consumption.

Enterprises must establish strong data governance teams that are fiercely independent, consisting of members who are diligent and with exacting standards and ethics. These teams must create, establish, validate and measure key metrics on data attributes and data sets on a regular basis. Thresholds must be established and enforced. KPM's (key performance matrics) on data drift must be reported on a weekly basis and remediation efforts must be put in place and enforced. And finally, where appropriate, self identified audit issues (SIAI) must be raised to shine a light on areas that need remediation and additional oversight.

Well put Hector - In line with the old adage "Garbage in Garbage out"

Like
Reply

Nicely done Hector. Without data governance the credibility of data is always in question.

Like
Reply

To view or add a comment, sign in

More articles by Hector M.

  • Managing and tracking IT Assets

    Its a fact that today, Information Technology (IT) budgets and spend comprise a big portion of a firms expense on the…

  • Data diplomacy

    We work with a deluge of data on a daily basis. Data is practically free for the taking if you know where to look.

  • Data trust

    Data is the foundation of pretty much everything we act on. There are many data sources, but how does one build trust…

Others also viewed

Explore content categories