Big Data

Arpit Patel

Published Sep 16, 2020

Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Now a days, almost all the big MNC’s like Google, Facebook etc. are dealing with massive data every day

According to Facebook report it deals with more than 500 Terabyte (TB) of data and about 2.5 billion pieces of content every day. Almost 2.7 billion like action is taking place and more than 300 million photos are pulled per day and it reaches approximately 105 Terabyte of data each hour.

Twitter: 12 Terabytes Per Day

One wouldn't think that 140-character messages comprise large stores of data, but it turns out that the Twitter community generates more than 12 terabytes of data per day. That equals 84 terabytes per week and 4368 terabytes — or 4.3 petabytes — per year.

If you are thinking of big storage device or hard disk yes, it is possible but think about the cost for making such device and also think about how much time will be required to fetch the data from the storage, it will be really very slow for assessing the data.

Distributed data processing is a computer-networking method in which multiple computers across different locations share computer-processing capability. This is in contrast to a single, centralized server managing and providing processing capability to all connected systems.

Advantages of Distributed Data Processing

Distributed data processing is a computer-networking method in which multiple computers across different locations share computer-processing capability. This is in contrast to a single, centralized server managing and providing processing capability to all connected systems. Computers that comprise the distributed data-processing network are located at different locations but interconnected by means of wireless or satellite links.

Lower Cost
Reliable
Improved Performance and Reduced Processing Time
Flexible

Apache Hadoop was developed with the goal of having an inexpensive, redundant data store that would enable organizations to leverage Big Data Analytics economically and increase the profitability of the business.

A Hadoop architectural design needs to have several design factors in terms of networking, computing power, and storage. Hadoop provides a reliable, scalable, flexible, and distributed computing Big Data framework.

Master/slave is a model of asymmetric communication or control where one device or process (the "master") controls one or more other devices or processes (the "slaves") and serves as their communication hub.

Suppose we have to store 800 Terabyte data then the problem is that we don’t have that much space then we use concept of distributed system and distribute the data in hundreds or thousands of slave device this concept is known as Master Slave Distributed System.

To view or add a comment, sign in

Big Data

Arpit Patel

Advantages of Distributed Data Processing

More articles by Arpit Patel

Explore content categories

Advantages of Distributed Data Processing

More articles by Arpit Patel

🔰 Create a Setup so that you can ping google but not able to ping Facebook from same system

🔰 Research how Kubernetes is used in Industries and what all use cases are solved by Kubernetes?

🔅Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Increase or Decrease the Size of Static Partition in Linux.

🔅Setting up Python Interpreter and running Python Code on Docker Container

Configuring HTTPD Server on Docker Container

🔰 Create High Availability Architecture with AWS CLI Using CloudFront , S3 and EBS 🔰

Benefits MNCs are getting from AI/ML: A case study of Flipkart

AWS Cloud Services in Coursera

AWS (Amazon Web Services)

Explore content categories