Data gravity in banks and how it affects data management strategies
#dataarchitecture #Gravity #Banks #architecture #Cloud #Strategy #datagravity #datamesh #datahub #datafabric
“Financial institutions are business entities engaged in dealing with financial and monetary transactions!”
It is a perfunctory definition, right? But what is the first thing that comes to a data architect's mind from this basic definition? That they must have some really large “bodies of data” with a lot of activities going on in these bodies.
Banks and FIs entities store large volumes of data in their enterprise data repositories to be used to support their business in the banking industry. Basically, to get to know their customers and to support and optimize the services provided to these customers, besides adhering to regulatory requirements.
As a result, banks build “big data” platforms to support various business areas such as:
What is data gravity and where it comes from?
As the volume of data within these big data platforms grows large enough, these data platforms start to show the phenomena of high data gravity:
We can think of the similarity between a large body of data - such as a data lake or an enterprise data warehouse- and a heavy object travelling in the information management space; The larger the body of data becomes, the greater its “gravitational attraction force”. Meaning that many consumer groups will be attracted to this large data source, more systems will be integrated with it, more services will rely on it, more projects, more data is pumped to it from other systems, more teams get involved, and more projects and funding.
Data gravity is the observation of this “attractive force” of big data platforms, the term itself was coined by “Dave McCrory” (https://www.datacenterdynamics.com/en/profile/davemccrory/) borrowing the analogy of gravity from physics as you get a lot of “attraction force” if you have so much mass or so much energy.
Applying the same concept concerning the data domain; Dave considered a body of data to be “heavy” if they are large in volume or have so many activities running on them.
Why should financial institutions be concerned about data gravity?
Digital transformation and process automation within banks and FIs are creating enormous volumes of data, typically; these institutions hold a massive amount of data in their big data platforms, data lakes or data warehouses.
The conventional architectural approach was to extract data from these massive analytical data sources into a serving repository that is more accessible to the consumer systems, (Consumer systems can be analytical systems such as BI or ML platforms or Operational systems that rely on regular refreshes from these data sources for core operations such as CRM , CBS or LMS)
As data and data consumption grows, this conventional approach becomes cumbersome. It becomes more and more difficult to keep moving massive volumes of data between the enterprise’s repositories, relying on heavy bodies of data can be problematic because it is:
Recommended by LinkedIn
Source:https://learn.microsoft.com/en-us/azure/storage/common/storage-solution-large-dataset-low-network
Is it possible to quantify data gravity?
Despite being observed as a qualitative characteristic of data repositories, efforts were paid to find a quantification formula to express the data gravity.
The most standardized effort was led by “Dave McCrory” himself in cooperation with “digital reality” ; their approach was based on identifying the following factors contributors to the
Dave came up with the following formula for quantifying the data gravity, named the calculated outcome (The data gravity index), it is not an absolute value but it can be used to compare the change in data gravity between two points in time or compare two data platforms.
Source : https://go2.digitalrealty.com/
What can banks and financial institutions expect as their big data gets “heavier”?
How do we tackle data gravity-related issues?
Let's leave this point to the next post!