Big Data Introduction , problems and their solutions :-
As we see in now a days everything got connected to internet and become a social life. everyday uploaded our things on Social media platform like facebook , Twitter , Tinder , Instagram and many more .
we are living in a world where we are surrounded by data. whatever technology we use , it deals with the data .
we scroll Instagram and see the posts , comment on that or when we using Facebook and upload a photo but we never thing about where our photos goes and who manage it or we never think which companies are managing our comments ? we never think where these hug amount of data that daily generated got stored ? what is solution for this ?
Answer of all these question leads us to the concept of Big Data . so let us see first What is big data .?
What is Big Data ?
Big data is larger, more complex data sets, especially from new data sources. Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time.
If we take one example of Facebook, then
According to the data revealed by Facebook,
It collects 500+ terabytes of data every day.
2.7 Billion likes and 300 million photos per day.
It scans 105 terabytes of data in every 30 minutes.
So we see the data collected by these companies on daily basis are very huge. To deal with such huge data, the companies require huge storage.
Why Is Big Data Important?
he importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analysis , you can accomplish business-related tasks such as:
- Determining root causes of failures, issues and defects in near-real time.
- Generating coupons at the point of sale based on the customer’s buying habits.
- Recalculating entire risk portfolios in minutes.
- Detecting fraudulent behavior before it affects your organization
What are the challenges that come with Big Data? :-
* Volume :-
The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
* Velocity :-
Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
* Variety :-
Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.
Big Data Use Cases :-
* Product Development
* Predictive Maintenance
*Customer Experience
* Machine Learning
* Operational Efficiency
How Big Data Works :-
Big data gives you new insights that open up new opportunities and business models. Getting started involves three key actions:
* Integrate :-
Big data brings together data from many disparate sources and applications. Traditional data integration mechanisms, such as ETL (extract, transform, and load) generally aren’t up to the task. It requires new strategies and technologies to analyze big data sets at terabyte, or even petabyte, scale.
During integration, you need to bring in the data, process it, and make sure it’s formatted and available in a form that your business analysts can get started with.
* Manage :-
Big data requires storage. Your storage solution can be in the cloud, on premises, or both. You can store your data in any form you want and bring your desired processing requirements and necessary process engines to those data sets on an on-demand basis. Many people choose their storage solution according to where their data is currently residing. The cloud is gradually gaining popularity because it supports your current compute requirements and enables you to spin up resources as needed.
* Analyze :-
Your investment in big data pays off when you analyze and act on your data. Get new clarity with a visual analysis of your varied data sets. Explore the data further to make new discoveries. Share your findings with others. Build data models with machine learning and artificial intelligence. Put your data to work.
Let us try to imagine the data received by the companies receive every data.
We upload the images/photographs on Facebook. This statement does not boggle on the mind until you are not able to realize that Facebook user has more than China’s population and each user upload photos on Facebook. Facebook is storing roughly 250 billion images. Now just think about 250 billion images. In 2016, Facebook had 2.5 trillion posts.
The next big Company is Google :-
Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide and Google currently processes over 20 petabytes of data per day.
These are all data and it needs to store in the hard disk. So these numbers are so big that we cannot imagine. This is basically the volume vector.
What are the solutions for solving the problem of Big Data ?
The most optimal solution now a days that almost all companies uses is Distributed Storage .
A Distributed Storage is an infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes
For Example , lets consider we have 500 GB of data but we have less resources to store it .One can think lets buy 500 GB storage and store our data into it . By doing these we can store our data but requires more time to process the data that arises the problem of I/O handling . Now the best solution for these is to divide the storage like 500 GB in 5 parts of 100 GB and lets store it in different 5 storage centers . Due to these our data can stored very efficiently which removes the volume problem and also it got stored in less time which removes the velocity problem .
In these Big Data world above 5 storage centers where we distribute our storage are known as Slave Nodes and from where we distribute our storage to slave nodes is known as Master Node . Now all these nodes combine to form a Infrastructure called as Cluster .In Big Data world it is known as Distributed Storage Cluster .
So , now we got an idea about Big Data and how the MNC'S like Google , Facebook etc solve the challenges of Big Data .
For any further queries mail me on :-
piyushsinghsanchit@gmail.com
Thank you :-