Is Big Data same as Large amount of Data ?
Big Data is the new buzzword in the industry. But what actually is Big Data. As the name suggests most people think Big Data means large datasets. Possibly terabytes/petabytes of data. And that’s it.
But that’s not what Big Data is. It is distinctly identified as “3V’s” : Volume, Velocity and Variety of data.
Volume : Of course large datasets. Includes structured and unstructured data
Velocity : Fast moving, ever changing. Think streaming
Variety : Includes information, pictures, videos, chats uploaded to social media, pictures coming from camera, IOT devices, tweets, mobile devices, RFID readers
Additionally, the insight provided by this data is what constitutes “Big Data”.
It’s not just about how much or what kind of data you have but what you do with that data.
Having said that, large amount of data is still very important because the more data we have the more accurate the sample would be. Which in turn helps in coming up with better models and better insights.
Some use cases of Big Data :
- Understanding and Targeting Customers : understand customers and their behaviors/preferences and create predictive models
- Fraud Detection : Based on Pattern recognition
- Understanding and Optimizing Business Processes: Retailers are able to optimize their stock based on predictions generated from social media data, web search trends and weather forecasts
Now that we have understood that Big data is not just about large volumes of data but about how to put that data to use, we need to talk about how to process that data.
Batch processing is an efficient way to generate insights when working with a high volume of data. Processing time can take between minutes and hours. Operations can be complicated.
When your most important consideration is extracting near real-time insights from massive amounts of data, you need Streaming processing. Here we are talking about large volume of data coming in at high velocity. Operations need to be less complicated with response time of seconds. Streaming data is processed even before it ends up in a data warehouse.
Few use cases of Streaming data processing :
- A music streaming service looks at user-listening data to automatically improve its user recommendations.
- Network monitoring
- Intelligence and surveillance
There are two approaches for Stream processing :
- Native Stream Processing : Every event is processed as it comes in, resulting in the lowest possible latency. But processing every incoming event is also computationally very expensive
- Micro batch processing : In this incoming events are divided into batches either by arrival time or until a batch has reached a certain size. This reduces the computational cost of processing but can introduce latency.
“Big Data” is rapidly changing the world and empowering AI at scale