Big Data
Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.
Now a days, almost all the big MNC’s like Google, Facebook etc. are dealing with massive data every day
According to Facebook report it deals with more than 500 Terabyte (TB) of data and about 2.5 billion pieces of content every day. Almost 2.7 billion like action is taking place and more than 300 million photos are pulled per day and it reaches approximately 105 Terabyte of data each hour.
Twitter: 12 Terabytes Per Day
One wouldn't think that 140-character messages comprise large stores of data, but it turns out that the Twitter community generates more than 12 terabytes of data per day. That equals 84 terabytes per week and 4368 terabytes — or 4.3 petabytes — per year.
If you are thinking of big storage device or hard disk yes, it is possible but think about the cost for making such device and also think about how much time will be required to fetch the data from the storage, it will be really very slow for assessing the data.
Distributed data processing is a computer-networking method in which multiple computers across different locations share computer-processing capability. This is in contrast to a single, centralized server managing and providing processing capability to all connected systems.
Advantages of Distributed Data Processing
Distributed data processing is a computer-networking method in which multiple computers across different locations share computer-processing capability. This is in contrast to a single, centralized server managing and providing processing capability to all connected systems. Computers that comprise the distributed data-processing network are located at different locations but interconnected by means of wireless or satellite links.
- Lower Cost
- Reliable
- Improved Performance and Reduced Processing Time
- Flexible
Apache Hadoop was developed with the goal of having an inexpensive, redundant data store that would enable organizations to leverage Big Data Analytics economically and increase the profitability of the business.
A Hadoop architectural design needs to have several design factors in terms of networking, computing power, and storage. Hadoop provides a reliable, scalable, flexible, and distributed computing Big Data framework.
Master/slave is a model of asymmetric communication or control where one device or process (the "master") controls one or more other devices or processes (the "slaves") and serves as their communication hub.
Suppose we have to store 800 Terabyte data then the problem is that we don’t have that much space then we use concept of distributed system and distribute the data in hundreds or thousands of slave device this concept is known as Master Slave Distributed System.