Data First

Data First

Process automation, systems integration, and AI analytics are top priorities for IT leaders, expected to deliver major productivity gains. But studies (including Forrester and Gartner) consistently point to high rates of failure, often caused by data inconsistency and poor data integration readiness.

In two disconnected systems, the meaning of a data field that exists in both and originally represented the same thing, can evolve to become very different over time in business use and in value. A locally used Excel file may store client data spelled differently than in the ERP. At data entry, fields that would be important but are not enforced as mandatory may be left blank. Values could be reused inconsistently, or new codes introduced known only to a small group of users. Even in organizations with well-defined data ownership, no single person may oversee how data usage evolves across systems.

That’s why, at the launch of integration or analytics projects, data discovery may come up with an integration effort estimate that surprises the stakeholders and project owner as extra cost and time. In my experience, sponsors and data owners (who do not have visibility to all data outside their scope) frequently overestimate the readiness and quality of company data. Traditional data ownership alone isn't enough: a business unit may not be aware that another department introduced their own, partly overlapping but still different client categorization in their local data tables.

To avoid such unexpected project obstacles, I see two effective strategies:

1. Maintain documentation of all data models. First, capture in what systems and how fields are used, with the description and value ranges of each field. Then extend change management processes to these data fields. It can be a significant effort to start with and needs discipline to maintain, but it helps break the illusion that the organization already has a total oversight and control over its data. With continuous maintenance, data documentation is always available, for project after project.

2. Plan a data discovery phase. Before an integration development begins, run a sub-project to identify trusted sources, clean inconsistencies, and validate data quality across the systems. Don't rely on a possibly false assumption that the main project will easily be able to build the integration because all departments are reporting that their data is good. Preferably identify the "single source of truth" in this phase, to avoid unnecessary work on information that eventually will not be part of the integration.

While building the full data integration stays in the scope of the main project, such preparations may help avoid delays, unpleasant surprises and increase the chance of overall success.

How do you manage data integrity in your projects? What lessons have you learned from cross-system data discovery?

To view or add a comment, sign in

More articles by Karoly Kovago

Others also viewed

Explore content categories