Big Data Is Not Large Data
I had the unfortunate pleasure of being the house sitter in my department over the December holidays. This is the period I love most because I get to build, without interruptions, all the cool models that can show the company there is money to be made even in avenues that were not considered profitable before.
During the past holidays, I built a model that analyses hourly cell tower utilization and revenue generation. The analysis covered a period of 3 months to establish behavioral patterns and to see how our various products perform per hour per GPS location. As some of you can imagine, this is a lot of data. My employer, a major telecommunication company, has in excess of 50 million subscribers that are attached to our network, with over one billion records added daily to the company’s database. This gives an indication of the volume of data that the telecommunication company deal with.
While sharing my insights with a colleague, it occurred to me that this may not necessarily be a big data project. When people think of big data, they automatically think of quantity of data, that is, they associate big data with terabytes or petabytes.
Big data is not only large data. Let me explain this point.Think about the number of people on earth -- over 7 billion. Their heart rates are generating data, their cell phones are generating data, their cars and yes their tweets and Google searches. This data is large and we cannot debate that, so large data measures one factor, size. That’s it. Large is about the mass of data that is available and also this data is stagnant, and pretty useless. If I say to you, ‘right now, there are 2 billion people that are busy on their electronic devices’ (and this may not be necessarily true), what does it mean? Absolutely nothing!
Here’s the big data side. Big data includes large data but is not necessarily limited to it. For an example, a high school pupil scored 85% in physical science. This piece of data in isolation does not tell us much. However, if we look at this pupil’s scores at other subjects, some facts can be inferred. Let’s say the pupil’s report shows that he/she obtained 65% in biology, 88% in mathematics and 55% in English. We may now arrive at some conclusions -- for instance, that the pupil has excellent numerical abilities.
What if we looked at the entire class and found out that the class average for physical science was 80%, 70% for biology, 90% for mathematics and 70% for English? Now we have established relativity. This means we can assess this pupil’s performance relative to his/her peers and then make other more detailed and accurate inferences. We can go a step further and compare the entire grade, then the regions and so on -- you get the picture.
Let’s add another dimension to it. We now look at post-graduation careers and opportunities of all the graduates that had similar marks to our pupil. This provides a range of parameters to evaluate, which include the universities they attended, the preferred professions, the cars they drive, their average salary, etc. We now have a better picture of what our pupil can accomplish, all else equal.
This in a nutshell is what big data is about. Taking a multitude of information/observations and capturing it as data , distilling from that data what is important, establishing relationships between data and arriving at a decision or accurate description of an otherwise unknown detail. Big data is more dynamic and insightful. It is also about connecting different data to create a holistic view of the problem,so that a company can have 2 terabytes of data or even 1 gigabyte, by understanding that all data is connected.
Companies can then build more comprehensive data maps of their industries or of their customers or clients. They can use this to leverage their core strengths and improve operations, customer relations and eventually revenue. If you have a small company, remember you are not too small for big data.
This switch in thinking about big data and large data is essential in ensuring that the use of Big Data Analytics continues and that it enhances business and customers alike.
I spend most of my time looking at ways the business can improve how it deals with its clients, how its products are structured and where do new opportunities lie for both the company and its customers.I use many tools to do this and I still find not getting intimidated by the data allows me to truly explore the possibilities we have as a business.
I would highly appreciate any comments on the issue of Big Data and the challenges your organization faces.
👏Great Article. I think the problem specialists face is lack of standard definitions for some of these emerging concepts which leads to most classifying "Big Data" as magnitude of data as opposed to the spectrum of characteristics a data set encompasses about object the dataset is describing.