In Data We Trust
Data insights, data analytics, business intelligence, intelligent automation, training AI models, etc. have all come about because of the increasing availability of data across enterprises and industry. The expectation is that data can help an organization to improve operations, provide actionable insights, enable AI, drive innovation, manage risk and ensure compliance. However, research conducted by IBM in 2021 shows that 60% of business executives don’t always trust their company’s data. As a result, more than a 33% do not base most of their decisions on data. Additionally, in 84% of organizations, data analytics projects were delayed due to the data not being in the right format while in 82%, the data used was of such poor quality that analytics projects needed to be reworked. As a result, the benefits of using data for business do not materialize and trust is diminished.
To understand how to improve trust in an organization, we must first have a clear definition. Trust can be defined as a mental state comprising:
(1) expectancy - the trustor expects a specific behavior from the trustee (such as providing valid information or effectively performing cooperative actions);
(2) belief - the trustor believes that the expected behavior occurs, based on the evidence of the trustee’s competence, integrity, and goodwill;
(3) willingness to take risk - the trustor is willing to take risk for that belief
Trust, with respect to data, is the belief by a data organization that they are delivering data to be used for insights, operations and delivering innovation must align with the expectation of the consumers of that data to achieve the same goal. Any misalignment leads to a lack of trust. Trust in data by those using the data comes from “the ability to have high confidence that the data that is generated, processed, stored, or transmitted by computers and computer-connected devices has a process, provenance, and correctness that is understood”. To have trust it must be understood how data is generated, processed, and stored, as well as the process, provenance, and correctness.
Trust in data is also based on the believability of data. Data believability is defined as” the extent to which data are accepted or regarded as true, real, and credible”. This contextual assessment stands in contrast to other DQ dimensions which seek to assess data based on intrinsic measures alone.
Recommended by LinkedIn
As mentioned, trust in data is gained by delivering a comprehensive view of quality data that is governed, secure and ready for analysis. With an understanding of what trust is, we can look and how trust is obtained. Creating trust in data has 2 major components: 1) the governance of data, such as lineage tracking, data quality, data standards, compliance, and policy enforcement, and 2) the availability of data to the people who need to use it or to the AI models that require it through self-service mechanisms with visibility to the governance identified above.
Governing data provides reliability and precision of the source data and analysis performed on it. Governance can also enable lineage tracking and policy enforcement, both which are required to manage data, to provide a solid data foundation. Certifications, compliance, distributed GCR, and business controls show evidence of competency in how data is store and processed. Mediating and exposing the flows of data, including lineage, gives consumers of data the understanding of where the data sourced, what transformations have or have not happened, and control over where it flows.
Developing trust in data can be further realized through improvements in three areas. The first is process Improvements. Included in process improvements is better data cleansing and data quality management around the sourcing, capturing, and collecting data. These process improvements are primarily provided through data governance and leads to a transparent view of the data so that users (trustors) understand what has happened to the data prior to them getting it and determines whether their expectation of what they needs is what they received.
The second area for improvement is integration of data. Data that is left disconnected and in its own organizational silo is a large cause of data distrust among users. Siloed data is often challenging to combine with other siloed data sources. Integration builds trust through availability of useful data that can be combined. Governance aides this by cleansing, standardizing, and de-duplicating the data through the integration process.
The final area for improvement is through the adoption of automation. Automation removes areas of potential risk, improves timeliness of data, the movement of data, and the logging of actions. In short, automation is a cornerstone to speeding up processes and providing evidence for compliance.
Ultimately, trust in data comes down to 3 things: 1) the transparency, reliability and precision of the source data and analysis performed on it (Governance), 2) the protection, control, and compliance of the data to satisfy both customer and legal expectations (Security), and 3) the mediation of the flows of data to control lineage. Succeeding in these 3 areas will build trust in people wanting to use data for benefit by providing them clear information and evidence about how the data was collected, transformed, protected, governed, and made available. The users’ expectations and beliefs are met allowing them to look past the data itself and focus on the outcomes and taking a risk making decisions based on a trust in those outcomes.