__Getting started with Big Data__
This article is for those who want to know basics of Big Data like what is big data, what is the need of it in the current IT industry and its challenges and how companies overcome these challenges.
So lets start with very simple question. What is big data?
In a way, big data is exactly what it sounds like - a lot of data. Since the advent of the Internet, we've been producing data in staggering amounts. It's been estimated that in all the time leading up to the year 2003, only 5 exabytes of data were generated -- that's equal to 5 billion gigabytes. But from 2003 to 2012, the amount reached around 2.7 zettabytes (or 2,700 exabytes, or 2.7 trillion gigabytes) [sources: Intel]. According to Berkeley researchers, we are now producing roughly 5 quintillion bytes (or around 4.3 exabytes) of data every two days.
We all are aware of Facebook, it processes about 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.
But lot of data does not mean that is has to be in millions of gigabytes or zettabytes. Any amount of data that can not be managed and processed by the traditional means of data management comes under the category of Big Data. If some organization does not have measures to manage some gigabytes of data, lets say 10 or 20gbs, by existing traditional means of data management like relational databases etc, then that amount of data also comes under big data for that organization. But generally, Big Data term is used to refer to large volume of data having enormous varieties.
Now we come to the need of big Data.
In a very general terms if I say then management and processing of data is done to draw some actionable insights from the data and grow the business. Lets talk about some use cases:
A real example of a company that uses big data analytics to drive customer retention is Coca-Cola. In the year 2015, Coca-Cola managed to strengthen its data strategy by building a digital-led loyalty program. Coca-Cola director of data strategy was interviewed by ADMA managing editor. The interview made it clear that big data analytics is strongly behind customer retention at Coca-Cola. To read the whole interview refer the link: https://www.adma.com.au/resources/how-coca-cola-uses-data-to-supercharge-its-superbrand-status
Other is Netflix which is also a good example of a big brand that uses big data analytics for targeted advertising. With over 100 million subscribers, the company collects huge data, which is the key to achieving the industry status Netflix boosts. If you are a subscriber, you are familiar to how they send you suggestions of the next movie you should watch. Basically, this is done using your past search and watch data. This data is used to give them insights on what interests the subscriber most.
Big data analytics can help change all business operations. This includes the ability to match customer expectation, changing company’s product line and of course ensuring that the marketing campaigns are powerful. Big data analytics also helps in risk management and there are many scenarios where companies are largely benefited from the use of Big Data.
Challenges thrown by Big Data.
Lets talk about the three Vs of big data, i.e., Volume, Velocity and Variety. These three Vs are not the only challenges but covers most of the say when it comes to challenges of bog data.
- Volume: Big data is any set of data that is so large that the organization that owns it faces challenges related to storing or processing it. In reality, trends like ecommerce, mobility, social media and the Internet of Things (IoT) are generating so much information, that nearly every organization probably meets this criterion.
- Velocity: If the organizations is generating new data at a rapid pace and needs to respond in real time, you have the velocity associated with big data. Most organizations that are involved in ecommerce, social media or IoT satisfy this criterion for big data.
- Variety: If the data resides in many different formats, it has the variety associated with big data. For example, big data stores typically include email messages, word processing documents, images, video and presentations, as well as data that resides in structured relational databases.
Now overcoming these challenges is a big discussion in itself. So we are going to talk about this later. For now I want to tell you about a very interesting framework which is used by almost all of the big IT firms like facebook and that is Hadoop. Hadoop is a distributed data storage and management framework. If you want then you can search about it own your own or I will talk about this also later.
Thanks for reading my article on Introduction to Big Data.