Data gravity in banks and how it affects data management strategies

Data gravity in banks and how it affects data management strategies

#dataarchitecture #Gravity #Banks #architecture #Cloud #Strategy #datagravity #datamesh #datahub #datafabric

“Financial institutions are business entities engaged in dealing with financial and monetary transactions!”

It is a perfunctory definition, right? But what is the first thing that comes to a data architect's mind from this basic definition? That they must have some really large “bodies of data” with a lot of activities going on in these bodies.

Banks and FIs entities store large volumes of data in their enterprise data repositories to be used to support their business in the banking industry. Basically, to get to know their customers and to support and optimize the services provided to these customers, besides adhering to regulatory requirements.

As a result, banks build “big data” platforms to support various business areas such as:

  • Produce regulatory reporting.
  • monitor customer transaction patterns.
  • Support customer segmentation.
  • Optimize customer journey.
  • Enable advanced analytics and ML.
  • Support multichannel services.
  • Implement risk management processes, and predict the exposure and value at risk.
  • Optimize product offerings.
  • Analyze customer feedback and market scoring. 
  • Business development research.

What is data gravity and where it comes from?

As the volume of data within these big data platforms grows large enough, these data platforms start to show the phenomena of high data gravity:  

We can think of the similarity between a large body of data - such as a data lake or an enterprise data warehouse- and a heavy object travelling in the information management space; The larger the body of data becomes, the greater its “gravitational attraction force”. Meaning that many consumer groups will be attracted to this large data source, more systems will be integrated with it, more services will rely on it, more projects, more data is pumped to it from other systems, more teams get involved, and more projects and funding.

Data gravity is the observation of this “attractive force” of big data platforms, the term itself was coined by “Dave McCrory” (https://www.datacenterdynamics.com/en/profile/davemccrory/) borrowing the analogy of gravity from physics as you get a lot of “attraction force” if you have so much mass or so much energy.

Applying the same concept concerning the data domain; Dave considered a body of data to be “heavy” if they are large in volume or have so many activities running on them.

Why should financial institutions be concerned about data gravity?

Digital transformation and process automation within banks and FIs are creating enormous volumes of data, typically; these institutions hold a massive amount of data in their big data platforms, data lakes or data warehouses.

The conventional architectural approach was to extract data from these massive analytical data sources into a serving repository that is more accessible to the consumer systems, (Consumer systems can be analytical systems such as BI or ML platforms or Operational systems that rely on regular refreshes from these data sources for core operations such as CRM , CBS or LMS)  

As data and data consumption grows, this conventional approach becomes cumbersome. It becomes more and more difficult to keep moving massive volumes of data between the enterprise’s repositories, relying on heavy bodies of data can be problematic because it is:

  • Data accumulates and grows faster and faster: this can be due to ingesting more data sources or from the interaction with the existing data (interacting with data produces more data!), causes of the growth can be from the interaction logs, the work files, security checks, audit trails, …  
  • Difficult to manage: The large volume of data introduces a whole set of data management issues; it gets harder to perform ingesting, processing, securing and storing of data, data modelling becomes impractical, and data management practices that require actual data movement become infeasible. 
  • Difficult to explore: Big data platforms tend to acuminate diversity of data components, with very little governance and almost no conformance to an enterprise data model of any kind, as it grows larger and larger the understanding of the foundational data gets less competent  
  • Difficult to utilize: the complexity of the process of onboarding more consumers and new use-cases is increased significantly, to the level where some consumers will look for data (Else ware) encouraging point-to-point data integration and jeopardizing the consistency and trustworthiness of the data integration flow.  
  • Difficult to change: to respond to changes in business requirements the underlying data components might require a certain level of change, heavy data platforms tend to be very hard to apply these changes due to the cascaded dependency of consumers
  • Difficult to move: the following diagram illustrates the expected time needed for physically moving data between platforms based on the bandwidth of the data moving channel, putting into consideration the overhead of data extraction and loading and the probability of transfer disruption will give an indicator on how hard it is to move data out of a “heavy” platform, even over high-speed networking.

No alt text provided for this image

Source:https://learn.microsoft.com/en-us/azure/storage/common/storage-solution-large-dataset-low-network

  • Expensive to scale: The massive amounts of data generated increase the requirements for additional capacity and services to utilize it. More storage space, continuous escalation of the demand for more storage, more computational power and more IO bandwidth will be inevitable, the demand size and cost are not expected to be “linear” against the data volume as promised by most big data management platforms but will be of higher order if we factor in all the difficulties around data management at immense capacity. 
  • Demonstrates high latency: it is a straightforward conclusion that querying data from a massive data repository that withstand a high workload is a process that is expected to be slow. 

Is it possible to quantify data gravity?

Despite being observed as a qualitative characteristic of data repositories, efforts were paid to find a quantification formula to express the data gravity.

The most standardized effort was led by “Dave McCrory” himself in cooperation with “digital reality” ; their approach was based on identifying the following factors contributors to the  

  • Data Mass: Data that is accumulated (Stored or Held) 
  • Data Activity: Data that is in motion (Creation, Interactions
  • Bandwidth: The Total Aggregate Bandwidth available to this location 
  • Latency: The Average Latency between this location and ALL other locations

Dave came up with the following formula for quantifying the data gravity, named the calculated outcome (The data gravity index), it is not an absolute value but it can be used to compare the change in data gravity between two points in time or compare two data platforms. 

No alt text provided for this image

Source : https://go2.digitalrealty.com/

What can banks and financial institutions expect as their big data gets “heavier”?

  • Deficiencies in data management practices:  The data management practice is designated to support the business, and when it comes to business requirements; It's all about feasibility and time to value. With heavy and inflexible data platforms, applying the apocopate governance practices and following best practices for architecture and data management becomes harder, leading to the introduction of activities that can be labelled “Quick and dirty” “Temporary workarounds” “Tactical solutions” and the list of labels goes on. The accumulation of such practices decreases the quality and efficiency of the data management environment and possibly compromises the reliability and quality of the data management process and its underlying platforms.
  • Failure to apply cloud migration strategies Adopting a cloud migration strategy helps identify and execute the fastest, lowest cost, least disruption transition from on-premises to cloud. But if the enterprise data repositories possess high gravity; The road to the cloud is littered with failed and delayed migrations. Banks will be very hesitant to perform the cloud migration according to plans, trying to justify the efforts and challenges of moving these heavy data platforms to the cloud.

How do we tackle data gravity-related issues?

Let's leave this point to the next post!

To view or add a comment, sign in

More articles by Mohammed Othman

Others also viewed

Explore content categories