INTRODUCTION TO BIG DATA
Big Data is not a new name for most of us but for some maybe, Exactly what is Big data? How does it differ from ‘normal data’? What makes it special or well big? These are some of the questions that were going through my head when I first saw the words Big Data and other related terms e.g., Hadoop, Apache, MapReduce just to mention a few. After my data science Nano-degree at Moringa School, I had to scratch that itch. The longing to understand various big data frameworks resolved to me doing my MSc. Big data Technologies. This is the second module in the course having completed Security Management successfully about 2 months ago.
As I go through this module, I will attempt to write about the different frameworks and processes involved in ingesting, processing, analyzing, modelling and visualizing big data.
Before giving the definition, let's understand the characteristics of Big Data since from these is the meaning derived. The 3 main characteristics of Big Data include:
This represents the amount of data generated, stored and operated varying between different organizations and sectors.
Where does the data come from?
This data is coming not just from the tens of millions of messages and emails sent every second via email, WhatsApp, Facebook, Twitter, etc. but also from the one trillion digital photos we take each year and the increasing amounts of video data we generate by upload or sharing. Not to mention the smart devices in our homes, workplaces, streets, on land, in water or up in space. The figure below highlights the leading sources of capacious data.
The frequency at which data is generated, captured, analyzed and stored is massive and continuous. In some systems there is need for immediacy and instantaneousness of receiving or transmitting data by users. This compels companies to improve their reaction and anticipation velocity. Big Data can describe high velocity data, with rapid data ingestion and near come real time analysis.
There are many different types of data and each type requires different and specific types of analyses. Big data comes in multiple forms, including structured and non-structured data such as financial data, text files, multimedia files etc. However, it mostly comes in an unstructured or semi-structured nature, which requires different techniques and tools to process and analyze.
Recommended by LinkedIn
To do this, we need to employ distributed computing and massively parallel processing (MPP) architectures that enable us to ingest and analyze parallelized complex data. The diversity of sources and formats of data represents a real technological challenge.
Therefore, from the basis of its characteristics Gartner defines big data as:
“An information asset whose volume is large, velocity is high, and formats are various”.
On the other hand McKinsey Research firm gives a more detailed interpretation of big data beyond :
“Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data – i.e., we don’t define big data in terms of being larger than a certain number of terabytes (thousands of gigabytes).
We assume that, as technology advances over time, the size of datasets that qualify as big data will also increase.
Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes)".
In conclusion, Big Data is worth very little unless we are able to turn it into insights. In order to do that we need to capture and analyze the data. In the coming 12 weeks, I will dive into this topic in depth and hope we’ll sail through together.
Remember, the hype around Big Data and the name may disappear (which wouldn’t be a great loss), but the phenomenon will stay and only gather momentum.
Thank you for your time to go through this. For any and all suggestions send me a message.
References.
Very informative piece