Is Big Data same as Large amount of Data ?

Big Data is the new buzzword in the industry. But what actually is Big Data. As the name suggests most people think Big Data means large datasets. Possibly terabytes/petabytes of data. And that’s it.

But that’s not what Big Data is. It is distinctly identified as “3V’s” : Volume, Velocity and Variety of data.

Volume : Of course large datasets. Includes structured and unstructured data

Velocity : Fast moving, ever changing. Think streaming

Variety : Includes information, pictures, videos, chats uploaded to social media, pictures coming from camera, IOT devices, tweets, mobile devices, RFID readers

Additionally, the insight provided by this data is what constitutes “Big Data”.

It’s not just about how much or what kind of data you have but what you do with that data.

Having said that, large amount of data is still very important because the more data we have the more accurate the sample would be. Which in turn helps in coming up with better models and better insights.

Some use cases of Big Data :

  1. Understanding and Targeting Customers : understand customers and their behaviors/preferences and create predictive models
  2. Fraud Detection : Based on Pattern recognition
  3. Understanding and Optimizing Business Processes: Retailers are able to optimize their stock based on predictions generated from social media data, web search trends and weather forecasts 

Now that we have understood that Big data is not just about large volumes of data but about how to put that data to use, we need to talk about how to process that data.

Batch processing is an efficient way to generate insights when working with a high volume of data. Processing time can take between minutes and hours. Operations can be complicated.

When your most important consideration is extracting near real-time insights from massive amounts of data, you need Streaming processing. Here we are talking about large volume of data coming in at high velocity. Operations need to be less complicated with response time of seconds. Streaming data is processed even before it ends up in a data warehouse.

Few use cases of Streaming data processing :

  1. A music streaming service looks at user-listening data to automatically improve its user recommendations.
  2. Network monitoring
  3. Intelligence and surveillance

There are two approaches for Stream processing :

  1. Native Stream Processing : Every event is processed as it comes in, resulting in the lowest possible latency. But processing every incoming event is also computationally very expensive
  2. Micro batch processing : In this incoming events are divided into batches either by arrival time or until a batch has reached a certain size. This reduces the computational cost of processing but can introduce latency.

“Big Data” is rapidly changing the world and empowering AI at scale

To view or add a comment, sign in

More articles by Vani Srivastava

  • Observability Maturity Model: A Roadmap to Enhanced System Understanding

    In today’s complex digital landscape, organizations face increasing demands for reliable and efficient software…

    2 Comments
  • Synergy Between Telemetry and Observability

    In the realm of modern system monitoring and management, two key concepts play pivotal roles in ensuring the…

    1 Comment
  • Observability of Tomorrow

    In the intricate web of modern technology and interconnected systems, the concept of observability has become…

    2 Comments
  • Datalake

    Initially Data was considered a cost by the Enterprises due to storage requirement associated with it. Today Data is no…

Others also viewed

Explore content categories