BIGDATA
Why Big Data became a Problem?
1- WHAT IS BIG DATA-
Big data is any data that is beyond the limit as per your infrastructure or it refers to data that would typically be too expensive to store, manage, and analyze using traditional (relational and/or monolithic) database systems. Usually, such systems are cost-inefficient because of their inflexibility for storing unstructured data (such as images, text, and video), accommodating “high-velocity” (real-time) data, or scaling to support very large (petabyte-scale) data volumes.
For this reason, the past few years has seen the mainstream adoption of new approaches to managing and processing big data, including Apache Hadoop and No-SQL database systems. However, those options often prove to be complex to deploy, manage, and use in an on-premises situation.
Finally We can say BIG DATA IS NOT A TECHNOLOGY, IT'S AN PROBLEM.
2- SOURCES OF BIG DATA-
FACEBOOK- Instead of servers that include compute, memory, flash storage, and HDD storage, Facebook’s disaggregated server model splits the various server components across separate racks, allowing it to tune the components for specific services and to use what Qin calls “smarter hardware refreshes” to extend useful life. By separating server resources mixes of compute, memory, and storage on different racks can be combined, for example, to deliver a set of servers that can run Hadoop. As loads and usage change, the balance of components that power service can be changed — keeping inefficiencies to a minimum.
GOOGLE- It processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.
And many other social networks are source of BIG DATA
3- Characteristics of Big Data-
Volume- Volume refers to the unimaginable amounts of information generated every second from social media, cell phones, cars, credit cards, M2M sensors, images, video, and whatnot. We are currently using distributed systems, to store data in several locations and brought together by a software Framework like HADOOP.
Facebook alone can generate about billion messages, 4.5 billion times that the “like” button is recorded, and over 350 million new posts are uploaded each day. Such a huge amount of data can only be handled by Big Data Technologies
Variety- As Discussed before, Big Data is generated in multiple varieties. Compared to the traditional data like phone numbers and addresses, the latest trend of data is in the form of photos, videos, and audios and many more, making about 80% of the data to be completely unstructured
Veracity- Veracity basically means the degree of reliability that the data has to offer. Since a major part of the data is unstructured and irrelevant, Big Data needs to find an alternate way to filter them or to translate them out as the data is crucial in business developments.
Value- Value is the major issue that we need to concentrate on. It is not just the amount of data that we store or process. It is actually the amount of valuable, reliable and trustworthy data that needs to be stored, processed, analyzed to find insights.
Velocity- Last but never least, Velocity plays a major role compared to the others, there is no point in investing so much to end up waiting for the data. So, the major aspect of Big Dat is to provide data on demand and at a faster pace.
4- Types of Big-Data-
Big Data is generally categorized into three different varieties. They are as shown below:
- Structured Data
- Semi-Structured Data
- Unstructured Data
- Structured Data owns a dedicated data model, It also has a well-defined structure, it follows a consistent order and it is designed in such a way that it can be easily accessed and used by a person or a computer. Structured data is usually stored in well-defined columns and also Databases.
Example: Database Management Systems(DBMS)
- Semi-Structured Data can be considered as another form of Structured Data. It inherits a few properties of Structured Data, but the major part of this kind of data fails to have a definite structure and also, it does not obey the formal structure of data models such as an RDBMS.
Example:Comma Separated Values(CSV) File.
- Unstructured Data is completely a different type of which neither has a structure nor obeys to follow the formal structural rules of data models. It does not even have a consistent format and it found to be varying all the time. But, rarely it may have information related to data and time.
- Example: Audio Files, Images etc