Big Data
Introduction:
In today’s era, numerous social apps are being developed which result in increasing data massively every day and when we talk about social media platforms , millions of users connect on daily basis , information is shared whenever users use a social media platform or any other website, so the question arises that how this huge amount of data is handled and through what medium or tools the data is processed and stored. This is where Big Data comes into light.
what is Big Data :
Big data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
V’s of Big Data :
Volume
Volume is how much data we have – what used to be measured in Gigabytes is now measured in Zettabytes (ZB) or even Yottabytes (YB). The IoT (Internet of Things) is creating exponential growth in data
Velocity
Velocity is the speed in which data is accessible. I remember the days of nightly batches, now if it’s not real-time it’s usually not fast enough.
Variety
Variety describes one of the biggest challenges of big data. It can be unstructured and it can include so many different types of data from XML to video to SMS. Organizing the data in a meaningful way is no simple task, especially when the data itself changes rapidly.
Variability
Variability is different from variety. A coffee shop may offer 6 different blends of coffee, but if you get the same blend every day and it tastes different every day, that is variability. The same is true of data, if the meaning is constantly changing it can have a huge impact on your data homogenization.
Veracity
Veracity is all about making sure the data is accurate, which requires processes to keep the bad data from accumulating in your systems. The simplest example is contacts that enter your marketing automation system with false names and inaccurate contact information. How many times have you seen Mickey Mouse in your database? It’s the classic “garbage in, garbage out” challenge.
Visualization
Visualization is critical in today’s world. Using charts and graphs to visualize large amounts of complex data is much more effective in conveying meaning than spreadsheets and reports chock-full of numbers and formulas.
Value
Value is the end game. After addressing volume, velocity, variety, variability, veracity, and visualization – which takes a lot of time, effort and resources – you want to be sure your organization is getting value from the data.
How Big is Facebook's Data :-
Arguably the world’s most popular social media network with more than two billion monthly active users worldwide, Facebook stores enormous amounts of user data, making it a massive data wonderland. It’s estimated that there will be more than 183 million Facebook users in the United States alone by October 2019. Facebook is still under the top 100 public companies in the world, with a market value of approximately $475 billion.
Every day, we feed Facebook’s data beast with mounds of information. Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted. That is a LOT of data.
instagram:-
Instagram is a behemoth in the world of social media. There are more than 800 million active users on it every month. 51 percent of this user base accesses it on a daily basis. 95 million photos and videos get uploaded to the platform each day. Since its inception in 2010, there have been over 40 billion photos and videos shared in total.
It becomes more and more staggering the closer you look at it all. So, it’s no wonder that businesses have set their sights on Instagram as a resource for mining big data. The information and insights gained from it have proven an invaluable resource for personalized marketing and research.
How Many Google Searches Are Conducted per Day?
So, we’ve all been curious about this search statistic. We know that there’s a lot of searches carried out on Google everyday, but how many exactly? Google processes over 3.5 billion searches per day (Internetlivestats, 2019).
Google searches’ growth rate expanded significantly in the first decade of the 21st century, but it started to decline in 2009 and 2010 and it’s currently estimated to be at around ten percent per year.
How to solve this Big Data Problem?
Here the role of Distributed storage comes in play.
What is Distributed Storage?
A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.
Distributed storage is the basis for massively scalable cloud storage systems like Amazon S3 and Microsoft Azure Blob Storage, as well as on-premise distributed storage systems like Cloudian Hyperstore.
Distributed storage systems can store several types of data:
- Files—a distributed file system allows devices to mount a virtual drive, with the actual files distributed across several machines.
- Block storage—a block storage system stores data in volumes known as blocks. This is an alternative to a file-based structure that provides higher performance. A common distributed block storage system is a Storage Area Network (SAN).
- Objects—a distributed object storage system wraps data into objects, identified by a unique ID or hash.
Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Developed by Doug Cutting and Michael J. Cafarella, Hadoop uses the MapReduce programming model for faster storage and retrieval of data from its nodes. The framework is managed by Apache Software Foundation and is licensed under the Apache License 2.0.
For years, while the processing power of application servers has been increasing manifold, databases have lagged behind due to their limited capacity and speed. However, today, as many applications are generating big data to be processed, Hadoop plays a significant role in providing a much-needed makeover to the database world.
Great Efforts 👌 Abhishek