Big Data Problems ........

Big Data Problems ........

Numbers are increasing day by day call it no. of people , no. of connections, or the amount of data .Since I'm here referring to the MNCs so we are going through some of the difficulties faced these companies.

Each user possessing some data and the data increasing data by day ......

This doesn't means that let's say ," Hey user you delete your previous data and Store something new you want ".

No alt text provided for this image



This is not the right way for these companies to look upon the handling big data in this way .





Now considering one more scenario, Let's say users are typing some text files and Here's there's not only a single user but millions of users and let's assume that now every user whatever they may have written now send their texts to the servers of these big companies. So will the computing unit in the datacenter of these companies will be able to handle storing this much huge data ?

But firstly why I am here taking about CPU and RAM stuffs. Why would we even need them on purpose of storing data on 🤔?


This is the fact that most of us really don't know because we as an user of any operating system , if we were writing some text file , normally we write a very few words about less than 5000 words in a text file at once . So processing this much of data won't be a big deal for our computers.

No alt text provided for this image


But if somehow you'll write lakhs of words and now try to save this text file , you will definitely feel some lag in this situation .Also we if somehow you can monitor the RAM in your PC then give it a try you will find your answer. This is the proof that we need storage units but high end computing units as well .

Conclusion !

So all these points clears that there is need of increase in storage capacity in the respective data centres of these companies with the advancement in their computing units.

So let's look upon some figures and judge how much storage capacity they really need .

No alt text provided for this image

Here is some of the data based on recent updates as per last year , giving per day data generated

* Twitter - 500 million tweets are sent

*Facebook - 4 petabytes of data are created

* WhatsApp - 65 billion messages are sent

* Google Search - 3.5 billion searches per day are made

* Emails - 294 billion emails are sent

Okay we got it ,we really need some big Hard-disk, Lots of RAM and also CPU .

But much big can these data Be ? Are we available with so much big sized Hard-disk ?

Actually for this we have lots of big companies like IBM, Dell EMC and many more who are ready with such huge type of storage devices.

No alt text provided for this image

So getting a look on above snap , one can get to know that we really have a lot of options

Well optimisation is still going on but in between these a solution came to developer's mind. They felt like what if they can create a Cluster of it 💡.

Here starts the journey of big data world management system.

In this process what they have done is ,they have written some program and planned to have a master and slave node setup and here each salve will have their Ram ,CPU and Storage and here they will be sharing all of these units to the master node.

How to achieve this type of setup ?

Here a software will be developed and this software is the one responsible for distribution the incoming data from various users through the world to the various slave nodes and therefore creating a setup which will store data in various slave nodes. This type of methodology is known as Distribution Storage Cluster.

What issues it can solve?

No alt text provided for this image

Each of the four were the major issues that we face in Big Data World

Topic was Hadoop right , where is it ?

Hadoop is one of the software by Apache . Its is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models

We can use it to overcome Big data issues using Distribution Storage Cluster approach to setup master-slave node approach and hence overcoming the Big Data Problems ...........


To view or add a comment, sign in

More articles by Nishant Singh

  • Configuring Hive with HDFS & MapReduce Cluster backend

    Hello to the reader , hope you are all doing great. Now that you are here, Lets just start it already 🙂.

  • Why handlers are used in Ansible?

    Handlers are the tasks which gets triggered when some changes are made to a particular task. This solves a very…

  • Setting up AWS CDN with AWS CLI

    Content Delivery Networks is one of the best utilization of a company's own private network across the globe. A company…

  • Play with IPs , IPv4 in particular

    This article is an interesting one at least for me, Although it takes time to understand networking concepts since I…

  • A Session with two experts

    The session was started by Mr. Arun Eapen with the explanation of what automation is and specially why we need it,So…

  • Configuring HAProxy-LB with Ansible

    As I always say its always better to have a look onto the basic technical terms to get started, and so lets see what we…

  • Configuring Hadoop(NN/DN) via Ansible

    Before getting hands on into any practical implementation its always good to know the terminologies..

  • Getting started with AWS CLI....

    This is an small article on explanation of getting started with Command line interface with some easy and helpful…

  • Ubisoft got enhanced with AWS

    Normally Every Gaming company demands some big infrastructure with good quality CPU, RAM for the game development and…

  • Hybrid Cloud Setup: K8s and RDS

    A great setup to learn the intrication of the two different cloud platforms working together with the help of terraform…

Others also viewed

Explore content categories