Data Volume - Simple Conversation

Dean Busenbark

Published Sep 10, 2021

Volume :

I'm not going to go into a lot of detail on the topic here, since this is not the forum to deliver a white paper on the topic. However, I would like to surface some personal thoughts in hopes to spark a conversation.

How much data are we talking about?

The volume aspect of data is simply a measurement of how many mega-, giga-, peta-, zetta-, or more bytes comprise the data artifact(s) in total. The volume can be a measurement of new data being acquired via a feed process, it can be the total footprint of existing data already within a system, or it can be a more ecosystem measurement. The volume of a new data feed is 100GB, while the volume of the data footprint for the system is 4TB, while the ecosystem’s volume is 2PB.

Focusing on the storage aspect of data, with volume comes cost. Even as the mantra of “storage is cheap”, along with the help of data compression algorithms to offset the actual size of the data versus storage footprint, there are still real dollars around storing data in large amounts. With the increase in deployments of systems which leverage IoT, Telemetry, logging, video, imaging, and other large data services, global data is gathered in fantastic volumes every second.

To illustrate the expected growth in data, I found the chart provided by Statista:

Recommended by LinkedIn

Join the Data Revolution

GO-Mobility 1 week ago

From Big Data to Smart Data: Making Your Information…

Hemant Panse 11 months ago

Managing Uncertainty in Telecom using a data analytics…

Ashwini Kumar 2 years ago

• Total data volume worldwide 2010-2025 | Statista

The key point of the chart being, by 2025 the “global data creation is projected to grow to more than 180 zettabytes”. In further reviewing the information provided in the chart, the projected annual average data growth rate percent by year seems to hold around ~28% from 2015 – 2025. While starting in 2020 (64.2ZB) and projecting through 2025 (181ZB), the data growth projected would be an almost ~3 times increase in the amount of data being created in just 5 short years. Even at these incredible data growth projections, I believe that the totals are still on the low side.

Using the numbers provided by Statista, for the year 2020, the potential storage cost of just the new data created would be ~$1B, assuming all the data created was being stored somewhere. As more and more companies use data in more interesting ways, to feed their data driven insights, the amount of data being retained over multiple years will increase.

Unlike the transitory nature of the processing layer of a systems, where memory and CPU reign as the focal point, data layers persist over time. The data layer continues to be created, gathered, analyzed, visualized and most importantly stored, however this life blood of business still goes mostly unmanaged. As data is stored, for various reasons, the new layers being acquired each day are added to the existing volumes thus continuing to add to the growing volumes. To add to this growing ecosystem, each layer of data tends to become disconnected from the old, even as they are piled on top of each other over time.

Even though data is stored in an ecosystem, and thus adding to the bloat of the data environment, the necessary metadata which would provide context, ownership, purpose and/or definition over time, is mostly absent. As the annual new world wide data grows to the monstrous +181ZettaBytes how much of it will be stored and provide real value, while others of it will be stored just in case. How much of the data will be usable even after 6 months, let alone a year or more, and how much of it will just be lost because it's no longer understood. How much of the data will be combined in incorrect ways because the knowledge of it's lineage was lost in some employee departure? With this massive investment in data volume, where is the investment in persisted metadata that would provide it clear value over time?

I know I started the conversation with a focus of data volume, but the point I’d like to persist is the need for governance and metadata which will allow for proper data management in this massive data ecosystem. Data is a persisted resource, and as data volumes increase, there is a real need to be able to clearly identify the data, it’s properties and characteristics so that it can be properly managed and secured through automation, thus potentially saving businesses millions of dollars in data storage costs annually. The metadata which would allow for businesses to use and govern their data beyond just today, or this month.

To view or add a comment, sign in

Data Volume - Simple Conversation

Dean Busenbark

Recommended by LinkedIn

More articles by Dean Busenbark

Others also viewed

Data Matters even more NOW

Data, huh, yeah. What is it good for?

Understanding Data Assets: Assessing Your Data's Value

The Perfect Big Data Platform — My Blueprint

Analysis of challenges and differences in the implementation of Big Data projects from an analytical perspective.

Put your data to work - treat it as a product

How Big Data and Smart Data can Together Create the Healthcare of Tomorrow?

Creating Business Impact With Data

From Data to Insights- Greater Value from Smart Meter Data

Hot, Warm, Cold Data: The Temperature Guide Every Data Professional Needs

Explore content categories

Recommended by LinkedIn

More articles by Dean Busenbark

Senior Security Operations Engineer

Data Entropy

Single version of the Truth?

The many 'V' attributes of Data

Others also viewed

Data Matters even more NOW

Data, huh, yeah. What is it good for?

Understanding Data Assets: Assessing Your Data's Value

The Perfect Big Data Platform — My Blueprint

Analysis of challenges and differences in the implementation of Big Data projects from an analytical perspective.

Put your data to work - treat it as a product

How Big Data and Smart Data can Together Create the Healthcare of Tomorrow?

Creating Business Impact With Data

From Data to Insights- Greater Value from Smart Meter Data

Hot, Warm, Cold Data: The Temperature Guide Every Data Professional Needs

Similar topics

2025 Technology Trends Affecting Storage Partners

Trends in Conversational Dashboards for 2025

How Data Storage Will Evolve

Explore content categories