How to value data quality...?

Thierry Delez

Published Oct 28, 2015

We established in a precedent post that although data has little or no intrinsic value, the actual business value enabled or endangered by data quality can be determined in a relatively straightforward manner: it requires putting existing data quality rules in a clear business context.

Actually, this process is neither difficult nor complex if one knows exactly what to do. The purpose of this post is to provide the keys to create value-based data quality reporting, starting with a simple case and walking through the various complexities.

Part 1: valuating a single rule

Let’s start by valuating the data quality of a relatively straightforward rule that is a classic case in many companies: each product must have a net and a gross weight.

Let’s now put this rule into a business context, which amounts to answering the following questions:

What would be the impact of non-compliance?
Where the impact would be felt?
How to measure the value endangered by non-compliance?

In our case, the answer is straightforward: products without weights cannot be shipped to customers, which indicates that the impact would be felt mainly by transportation and logistics.

Regarding value, we need to understand that we are looking at it from a governance perspective, which seeks to detect and eliminate the roadblocks preventing the actual value generation, while enabling the prioritization of the defect resolution. For that reason, we are not interested to know the “cost to fix”, which fails to represent the value of the risk posed by the data defect on the value chain. Besides, we don't want to “pollute” the governance report with the defective product data that has is not directly related to business activities.

For the sake of this exercise, we will assume that the value driver is the actual value of the products order book: each product data record would be attributed the value represented by the sum of all open sales order lines related to it (in a reference currency), e.g.:

Suppose that the company has 7 products in its catalogue, we can then assign each of them a sales order value:

Running the data quality rule according to the non-value based approach returns the following result:

The data quality level is really poor, with a majority of products having missing weights. This result does not mean much for the business, except there is maybe an alarming situation in data management (with reason).

To get the business facts, we associate each product with its corresponding value and see the true business picture emerging:

The contrast is striking: although data quality is 43%, the 57% defects impact 11% of the total order book value. This type of case is quite common: usually a fraction of data has a real business usage at a given time. We can now see that the rule is now perfectly contextualized:

The Key Performance Indicator is based on actual business value (there is no need to discuss data quality levels at all)
The value corresponds to the economic reality of the company
The priorities are clearly defined: products C and D must be resolved (C first as it impacts more value). But more important, products E and G, although defective, do not require immediate attention from business perspective. It enables data management to better focus the correction efforts!
By checking this business rule at sales order time, we already know that the shipment will be impossible: this approach provides a forward view of the data quality impact and therefore enables proactive resolution!

Contextualizing a business rule is as simple as associating a “value” dimension to any data quality report, a capacity that most – if not all – data quality solutions enable “out of the box”. The complexity lies in the aggregation of the value across several data quality rules.

Part 2: Valuating together several related business rules

Let’s add a new rule: “Each product must have a GS1 label”, which returns the following results:

Using the same methods as discussed in the part 1, we get the following value impact report:

The result is obvious. Now, let’s combine the two rules. If we do it from a data quality perspective, we need to consider that a record is correct when it passes ALL the related data quality rules:

To consolidate the result using value, we simply use the result of the rules combination instead of the results of the individual rules:

We see that although 71% of the data is defective, the overall business impact represents 23% of the total sales order value. The resolution priorities are now clear:

Fix the GS1 label for the product F
Fix the weights for product C
Fix the weights for product D
Resolve the rest of the data issues.

The simple combination of rules is when the following conditions are met:

All rules are related to the same records
All rules use the same value indicator (belonging to the same value chain).

Many data quality solutions are not designed to enable a combination of data quality results the way presented above. This issue can be circumvented either by selecting a more sophisticated tool or to perform the consolidation via dedicated reporting solutions.

Part 3: Adding complexity

Case 1: multiple value chains:

A rule may support several value chains: the examples above can be seen from an open sales order perspective (operational order to cash concern) or from a sales forecast perspective (product portfolio management).

In this case, each value chain is considered separately. There must be no “value merging” as it would remove the business meaning to value (the operational order to cash is likely to be included within the sales forecasts, adding them would make no sense).

At the end, the report may look like this:

The differences in the value results are due to the different perspectives and priorities of various business areas: in our case, the product D interests the operational order to cash manager as there is a recorded sales order, but the sales director has little concern for it since it is not a product for which he has committed any revenue. The support of multiple value chains at single rule level is achievable by most data quality solutions on the market.

Case 2: value visibility by organization/geography

The approach is similar; the main difference is the integration of the additional dimensions to the reports, e.g.:

Case 3: super-aggregation

The super-aggregation is the consolidation of a value indicator across various rules that may have different structures, e.g.:

Products
Product/markets
Product/organization
Product/customers
Etc.

The super-aggregation may be extremely complex, but from practical perspective can be resolved simply using the greatest common denominator between the aggregated rules. For instance, rules related to Products, Product/markets and Product/organization can be aggregated at product level.

In a similar manner, rules related to Product/markets, Product/organization/market and Product/market/customer can be aggregated at product/market level.

We can also perform aggregation (always within the same value chain) across seemingly unrelated objects, e.g. Customer/markets and Product/markets. In this example, aggregation is possible at market level if the relationship between customers and products is known (e.g. which customer buys which product).

Very few data quality solutions support natively this type of aggregation, at the cost of very rigorous modelling of the reports. However, consolidation via standard OLAP tools is possible, to the condition that the aggregation of value is performed, like in the second part of this post, based on a strict “good if ALL rules pass” condition (most modern OLAP solutions support this capability).

I will not detail any further the methods to be used, they are relatively common for people familiar with OLAP or BI-type of reporting , the key point being to prove the feasibility of such reports.

Case 4: multi-level value chains

The multi-level value chains cover the cases when the value driver is composed of complex elements. If we use the sales order as an example:

A defect impacting the customer endangers the entire value of the sales order
A defect impacting a sales order line (e.g. a product) endangers the value of the sales order line only.

The same value is enabled or endangered by the corresponding rules, but at different levels. Performing aggregation across rules of various value levels is possible but definitively not easy.

In the real-world, most if not all value chains are complex and multi-layered. Today, no data quality solution integrates a value chain modelling capability enabling to consolidate value at multiple levels. The consequence is that rules must be set either at sales order line level, at sales order header level, or both but using clearly separate rules and distinct aggregation reports.

This capability is a clear gap in today’s data governance application landscape and I’m hoping that a data quality software editor will consider it in the near future.

Conclusion

Contextualizing the business data rules and creating value-based data quality reports is possible, even by using actual data quality solutions in association with solid reporting tools, or through dedicated solutions.

The jump from pure data quality to value-based data quality reporting is more a matter of culture than technique as the methods are straightforward and enabled by today’s technical solutions.

The main question companies should consider is whether they want to integrate data quality in their value chains (data as enabler) or stick to the “traditional” methods, knowing that the step towards value-driven data management is relatively small.

Thierry Délez, Quantum.Et.Datum, October 2015

Paritosh Deshmukh 10y

Nicely articulated,. what are the standards that meet B2B and Point to Point communication when we talk of contextualizing data guality.

Daniel Meister 10y

Excelent article regardig DQM with high value for me, as I'm sitting with MDM between IT driven governance and business level practicing, and see daily MD issues we have to solve. Described methodes gives me a practicable tool to have the business view more considered (make necesity of data cleansing more understandable by business people) and finaly a nice way to have data cleansing prioritizied. Thanks Thierry for sharing this

Jessie Chimni 10y

Thierry, it is great to find someone else who is looking at data from a business value impact perspective. The essential aspect is that the "business rules" you refer to need to be visible to the business and cannot he buried in code. Second, it is critical to understand the business impact of of the data issues (accuracy, completeness, etc) in order it to be actionable. While business owns data, the current solutions are IT centric and inhibit businesses from embracing them. To be effective, these solutions need to highlight the business context.

2 Reactions

Aashish Singhvi 10y

Thierry, an excellent, most insightful article. Not polluting Governance reports with another DQ specific metric of "cost" and instead just using existing business operational metrics - is a great call-out. It makes the more tangible to the business users.

Nicolas Gunther 10y

Excellent article, thank you!

See more comments

To view or add a comment, sign in

How to value data quality...?

Thierry Delez

Part 1: valuating a single rule

Part 2: Valuating together several related business rules