IT Simplified - Big Data

IT Simplified - Big Data

What is Big Data?

With the rapid growth in the size and sources of data sets, data is collected by multiple devices like mobiles, IoT (Internet of Things – any smart device that we use), cameras, and more the velocity of data being generated and consumed has increased exponentially. In addition to the increase in the velocity, the volume and the variety of this data are huge, which makes it hard to handle this data using traditional Relational Databases. This is referred to as Big Data. Big Data generally denotes the large volume of information being generated or consumed. With the advancement of the internet, data is being generated at a rapid pace daily. Businesses are leveraging this data to make informed decisions and improve client experience.


No alt text provided for this image
Different Vs of Big Data

Characteristics of Big Data:

The characteristics of big data are commonly explained by 3 Vs (Volume, Velocity, and Variety).

Volume: As the name suggests, the volume of the data is large, which usually will be in the Petabyte scale. A pretty good example of such a tremendous volume of data is Twitter feeds, data from sensor-enabled IoT devices, weather satellites, and more.

Velocity: The rate at which the data is generated or transferred is referred to as velocity. Any internet-connected device generates data at a rapid rate, and it needs to be processed in real-time and the business decisions should be evaluated instantly.

Variety: Traditionally, data is always structured in the form of rows and columns. But the types of data have also grown into a wide array of unstructured and semi-structured data. Examples of this are audio data, videos, JSON, and more. Processing unstructured and semi-structured data is complex and involves a lot of effort compared to structured data.

Big data is also characterized by the 4 Vs (Volume, Velocity, Variety, and Veracity)

Veracity: Veracity refers to the truthfulness of the data. Big data in addition to being large, fast, and distributed, should also be truthful and reliable. This truthfulness and reliability attribute of big data is defined as veracity. This characteristic, also known as data quality, helps to achieve accurate and meaningful results.

In recent times big data is characterized using 10 Vs (Volume, Velocity, Variety, Veracity, Variability, Vulnerability, Visualization, Validity, Volatility, and Value).

Variability: In the context of big data, variability refers to the inconsistencies in the data. This could be caused by integrating data from disparate sources and transforming from different formats. The inconsistencies can be eliminated by applying the anomaly and outlier detection approaches before consumption.

Visualization: Visualization is a key characteristic of any data when extracting and representing it to its consumers. Rapid and well-informed decision-making processes require creative and effective data visualizations. However, mapping millions and billions of data points is challenging, and it involves niche techniques like network diagrams, tree maps, and more.

Validity: The accuracy of the data for the intended use case is commonly referred to as the validity of the data. Each business analysis or data analysis should identify and leverage ethical data for achieving the anticipated results. There are multiple levels of data validation processes that can be performed for ensuring the validity of the data before it is used in the business use case.

Volatility: The rate of change of data and the lifetime of data is referred to as the Volatility of Big data. In short, it could be identified as the expiration of the data for the intended use case. This is generally defined by the retention policy of the data.

Vulnerability: The security concern that arises when handling prodigious volumes of data being breached is characterized as the vulnerability of big data. Multiple data breaches have occurred in recent years without implementing equitable security measures to mitigate the security concern. Hence, vulnerability is considered a characteristic of big data, and serious security measures should be implemented to mitigate the risk.

Value: The last but most important characteristic is the value that is generated out of the data. The worthiness of the meaningful information that is generated out of the data refers to the value of the big data. Value also refers to the profitability that can be extracted from the analysis performed on the data.

Applications of Big Data: Big data provides organizations to discover hidden patterns, predictability of demand and risks, preferences of consumers, and other valuable business insights. With such massive potential, big data can be leveraged in all business domains. Some of the domains that currently leverage big data to a greater extent are healthcare, manufacturing, media, and government organizations, and the list continues to grow.


To view or add a comment, sign in

More articles by Ganesh kumar Murugesan

  • IT Simplified - AI primer (Part 1)

    Artificial Intelligence: The term artificial intelligence has got a lot of traction in recent times. It simply…

    1 Comment
  • IT Simplified - FTP vs SFTP

    FTP: FTP is File Transfer Protocol. In short, the file is moved/transferred between two different computers.

    1 Comment
  • IT Simplified - Cluster Computing

    Overview: Cluster computing is a method by which multiple computers are connected to a network and function in tandem…

Others also viewed

Explore content categories