INTRODUCTION TO BIGDATA

INTRODUCTION TO BIGDATA

No alt text provided for this image
BIGDATA

Bigdata is referred to as huge amount of data which company needs to store in their server for future reference.

Social Media corporations such as Facebook needs to store all their user data such as images,videos etc in their server so that the user or his friends can access them 

No alt text provided for this image
STORAGE

There are two types of storage hardware which are enterprise storage and commodity storage.

1.commodity storage

Commodity storage is a device that is relatively inexpensive, widely available and more or less interchangeable with other hardware of its type.To be interchangeable, commodity hardware is usually broadly compatible and can function on a plug and play basis with other commodity hardware products. In this context, a commodity item is a low-end but functional product without distinctive features. A commodity computer, for example, is a standard-issue PC that has no outstanding features and is widely available for purchase. 

2.enterprise storage

Enterprise storage is a centralized repository for business information that provides common data management, protection and data sharing functions through connections to computer systems.Enterprise storage has only single block with huge data storage capacity.

CHALLENGES FACED IN BIGDATA

One of the challenges faced in Bigdata is that corporations such as Facebook has an incoming daily rate of about 600TB. So they need massive enterprise storage devices which are very expensive and has some drawbacks such as limited volume and slow I/O speed.

To overcome the problem of volume and I/O speed we can use distributed storage

DISTRIBUTED STORAGE

Instead of buying enterprise storage which is expensive and has limited volume and slow I/O speed we can create a cluster with master-slave topology using commodity storages which are cheap.

ADVANTAGES OF DISTRIBUTED STORAGE


Establishing distributed storage is very cheap when compared to enterprise storage because we use commodity storage in distributed storage.

I/O speed depends on the number no nodes in the cluster which means that if the there are more more nodes in the cluster we achieve faster I/O

Even if we run out of the storage we can add another node as slave to the cluster to increase the volume


SOFTWARES TO IMPLEMENT DISTRIBUTED STORAGE

There are multiple softwares through which we can implement distributed storage such as Hadoop,Glusterfs,Ceph etc.But in today's world most of the companies use hadoop to implement distributed storage.

No alt text provided for this image

HADOOP

Hadoop is a free and open-source software created by Apache community to implement distributed storage using master-slave topology.In Hadoop master node is called as name node(NN) and slave nodes are called as Data Node(DN).Data node contribute their storage to name node in order to increase its storage.The transfer of data between the nodes takes place through HDFS protocol.When a large file needs to be stored,It is stripped into blocks and these blocks are stored in different data nodes 








To view or add a comment, sign in

More articles by Rithwik Reddy

  • AZURE

    what is cloud computing? Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go…

  • OPENSHIFT

    What is openshift? OpenShift is a family of containerization software products developed by Red Hat. Its flagship…

  • JENKINS

    What is Jenkins? Jenkins is an open source Continuous Integration server capable of orchestrating a chain of actions…

  • NEURAL NETWORKS

    What are Neural Networks? Neural networks are a set of algorithms, they are designed to mimic the human brain, that is…

  • AWS SQS

    AWS SQS stands for amazon web service(AWS) and simple queue service(SQS) is a part of AWS WHAT IS SQS Amazon Simple…

  • ANSIBLE

    ANSIBLE Ansible is an open-source software provisioning, configuration management, and application-deployment tool…

  • HOW TESLA REVOLUTIONIZED SELF-DRIVING CARS USING AI

    TESLA Tesla, Inc.is an American electric vehicle and clean energy company based in Palo Alto, California.

  • NETFLIX ON AWS

    What is Netflix? Netflix, Inc. is an American technology and media services provider and production company…

    1 Comment
  • DOCKER

    From this article you can know about docker and why docker is famous. First we need to understand what containerization…

    2 Comments

Explore content categories