The paradox of data

The paradox of data

Most organizations today understand the value of data and how it can add value to every facet of the organization. It is a resource that can add value to the process that produces the data as well as other related areas.

Data, unlike most resources, does not diminish in value upon repeated use. In fact, the same data can be used in multiple instances. The better data an enterprise creates the more value it can extract with visibility, real-time analytics, and more accurate training data for better AI and ML. However, the value that can be extracted from data depends on how well it is shared internally while abiding by legal laws.

But like any resource, data risks must be managed, particularly in regulated industries. Controls help to mitigate such risks, so organizations that have strong controls around their data are exposed to less risk than those that don’t.

This presents a paradox: data that is permitted to be freely shareable across the enterprise has the potential to add tremendous value for stakeholders, but the more freely shareable the data is, the greater the possible risk to the organization. To unlock the value of data, we must solve this paradox. We must make data easy to share across the organization while maintaining appropriate control over it.

One way to address this paradox is to take a two-pronged approach. Firstly, by defining ‘Data Products’, which are designed by people who understand the data and how to manage its permissible uses, and limitations. And secondly, by implementing a ‘Data Mesh’ architecture, which allows us to align our data technology to those data products.

This combined approach:

  1. Empowers data product owners to make management and use decisions for their data
  2. Enforces those decisions by sharing data, rather than copying it
  3. Provides clear visibility of where data is being shared across the enterprise

Aligning our Data Architecture to our Data Product Strategy

 Data products are groups of related data from the systems that support business operations. They are broad but cohesive collections of related data. We store the data for each data product in its own product-specific data lake. If each lake has its own cloud-based storage layer, that would be ideal.

For example – Sales and marketing data will be related to each other – so this can be the base for one kind of data lake.

The services that consume data are hosted in consumer application domains. These consumer applications are physically separated both from each other and from the data lakes. When a data consumer needs data from one or more of the data lakes, we use cloud services to make the lake data visible to the data consumers, and provide other cloud services to query the data directly from the lakes. The data product-specific lakes that hold data, and the application domains that consume lake data, are interconnected to form the data mesh.

Empower the right people to make control decisions

A data mesh architecture allows each data product lake to be managed by a team of data product owners who understand the data in their domain, and who can make risk-based decisions regarding the management of their data.

When a consumer application needs data from a product lake, the team that owns the consumer application locates the data they need in an enterprise-wide data catalog. The entries in the catalog are maintained by the processes that move data to the lakes, so the catalog always reflects what data is currently in the lakes.

The catalog allows the consumption team to request the data. Because each lake is curated by a team who understands the data in their domain and can help facilitate rapid, authoritative decisions by the right decision-makers, the consumption team’s wait time is minimized.

Enforce control decisions through in-place consumption

 The data mesh allows us to share data from the product lakes, rather than copying it to the consumer applications that will use it. In addition to keeping the storage bill down, sharing minimizes discrepancies in the data between the system that produced the data and the system that consumes it. That helps to ensure that the data being consumed for analytics, AI/ML, and reporting is up-to-date and accurate.

Provide cross-enterprise visibility of data consumption

Historically, data exchanges between systems were either system-to-system or via message queues. Since there was no central, automated repository of all data flows, data product owners couldn’t easily see when their data was flowing between systems. A good data mesh architecture addresses the visibility challenge by using a cloud-based Mesh Catalog to facilitate data visibility between the lakes and the data consumers. One could use AWS Glue Catalog or a similar cloud-based data cataloging service to enable this.

This catalog does not hold any data, but it does have visibility of what lakes are sharing data with which data consumers. This offers a single point of visibility into the data flows across the enterprise, and gives the data product owners confidence that they know where their data is being used.

To Conclude, Data Mesh in Action

Here’s an example to illustrate how the Data Mesh architecture will enable our business.

If a team was producing firmwide reports they would extract and join data from multiple systems in multiple data domains to produce reports.

Through the Data Mesh architecture, the data product owners for those data domains will make their data available in lakes. The enterprise data catalog will allow reporting teams to find and request the lake-based data to be made available in their reporting application. The mesh catalog will allow auditing the data flows from the lakes to the reporting application, so it’s clear where the data in the reports originates.

One development that will hugely boost this space will be blockchain technology. It will allow much easier and safer storage of data which allows the data or “ the new oil” to really start firing up the engines of progress.

To view or add a comment, sign in

Others also viewed

Explore content categories