WTF is Big Data?
A while back I was trying to explain the concept of Big Data to a someone who was not that technical, a ‘Persona Non Data’ if you will, and I found myself searching for an elegant analogy that would help.
Wikipedia, typically my go to for plain English no nonsense explanations, described it thusly:
“ a broad term for data sets so large or complex that traditional data processing applications are inadequate.”
There are many other attempts to define Big Data, which I think fall short, the most frustrating being the ones that simply say:
“big : adjective ( of considerable size or extent.)
and
data : noun (facts and statistics collected together for reference or analysis.)”
So poorly understood is the concept of Big Data, that I am considering "Describe Big Data to your parents" as a standard interview question.
Here is how I ended up explaining it…
Today, you can click on a link to your preferred online genealogy tool and enter a few details about yourself and your parents and, within minutes you are able to search a whole variety of vast data sets, collected for entirely unrelated purposes. Births deaths and marriage records from many countries, census data, passenger manifests, military records, phone directories etc etc.
Of course people did all this back in in 1953, but it took months or years, and required lots of travel, and correspondence.
Today, with each link you make, you open up the possibility of connecting your tree, with the tree’s of other distant relatives who may have uploaded photographs and audio recordings of your common ancestors.
This is where Big Data was about 10 years ago. We began to answer questions in minutes, that had previously taken us months, or years and at great expense. And gradually the idea of deleting information that was no longer required made less sense.
Now imagine your great great grandchildren researching their family tree, and them being able to pull up your entire online footprint. YouTube videos, Facebook updates, LinkedIn, Twitter, Instagram, Ebay - Perhaps they will be able to pay a premium to access the entire contents of your Gmail account?
Imagine just how rich the digital exhaust of future generations will be, and you can begin to imagine where Big Data is today, and more importantly where it is headed.
This analogy has everything, Cloud based, 'as a service offerings', rich collaboration tools, crowd sourced, unimaginable data volumes and a huge variety of unstructured data, all delivered at incredible velocity.
In addition, it's underpinned by some highly nuanced privacy implications that will likely receive little consideration until it's far too late.
Comment below if you agree or disagree.
Hi Jason, I agree to a point, for the most part Big Data is simply the concept of doing what we have always done at scale (Actuaries and Bookmakers have been doing big data for centuries, but it was very costly, took a long time and was highly error prone). That said there are some nuances that are commonly misunderstood, Let me give you the simplest analogy I know to explain one of these. In the history of the world there have been thousands of lottery winners, and if you were to search that data set you will find people that have won the lottery more than once. If you take that data set, you will find people that have won the lottery twice, and are of the belief that they can predict the future. This is simply the convergence of consequence and scale, however using traditional data processing you might end up concluding that they can indeed predict the future because this looks a lot like proof. So now you need to ensure that your data scientist understands statistics to a much higher degree purely because the scale of that data makes coincidence a certainty..
So my primary question is; why is traditional data processing inadequate? It seems that this assumption is just plain wrong. Big Data is just data; and all data has a structure even if it is just patterns and fluctuations in the stream. No matter what I see on this topic it still seems like Big Data is just the newest buzzword to make magical that which is mundane and manageable through proper processing methods.
Well, I understood that!
I think thats spot on Eric, as well as the follow on appreciation