Big data v's little data - scale or value?
Let me predict the future of Big Data……we will have more of it!
Information technology is a self-fulfilling prophecy, and the big data push is simply the latest example of this. If information technology is a good thing, them more of it must be even better.
The psychology of the situation is fascinating. Cognitive dissonance theory tells us that people who believe in something, even when presented with contradictory evidence, will go to great lengths to justify their position. William James’ book the ‘Will to Believe’ describes the power of faith despite evidence, and we are all familiar with the placebo effect, expectation influences subjective outcomes.
As organisations get bigger, managing many small initiatives becomes increasingly difficult, and there is a tendency to look for big solutions. Enterprise Resource Planning (ERP) is a classic example of this. Hence, the appeal of ‘Big Data’ a belief that if all of the available data is captured and analysed it can yield the insights that will solve our problems. But based on what evidence?
The sheer size of such initiatives then puts them into the ‘too big to fail’ category justifying ever increasing expenditure to keep them going. Surely anything that is too big to fail is just too big? Wouldn’t risk management suggest that risk should be reduced by breaking the big entity up?
Is there power in analysis? Of course there is! We learn by modelling problems and testing hypotheses and increased processing power allows us to test bigger data sets with more experiments.
Is there power in capturing, organising and protecting data? Of course there is! There is no performance without measurement and information.
An important tenet of good analysis is staying within domain. Staying within the boundaries of the data and not extrapolating, and staying within the limits of your expertise. How can you frame meaningful hypotheses if you don’t understand the subject matter?
Similarly, it is naïve to believe that capturing all data means old statistical sampling techniques are obsolete, or data analysis produces uncannily accurate results if all data has been considered, that it is passé to fret about what causes what, statistical correlation tells us what we need to know and scientific or statistical models aren’t needed. Or simply put, to quote The End of Theory a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves’.
The value in asset management lies not in huge scale, but in breaking up the big data into small data. Where good models based on first principles within the limits of both the principles and the data yield valid insight that is repeatable and sustainable. We then adapt the models on an ongoing basis as we learn, and in time this is the promise of artificial intelligence, but until then we are dependent on human interaction to help the models ‘learn’.
Gary Kasparov wasn’t beaten by Deep Blue, he was beaten by the humans that adjusted Deep Blue’s algorithms. They didn’t learn by studying every board game ever played. They didn’t even learn by studying every chess game ever played. The initial ‘opening’ library (domain knowledge) was provided by grandmasters, before running learning experiments using massive parallel processing, similar in concept to Hadoop today, and then fine tuned by another grandmaster between games against Gary Kasparov.
So if the value in big data is in breaking it down into small data and experimeting, why don’t we simply aggregate the learning from the numerous small data initiatives we have in progress every single day?
Great read. A small amount of the correct data will always be better than teams of rubbish, to often we want more and more analysis but often nothing is done from it. Small data which can be digested easily and quickly is a wonderful tool.
Very Good article...
Thanks for this Howard. Big is not always better! We still see persons ignoring risk based approach on building maintenance programs ending up with a mountain of work instructions which stop adding value - they end up costing the business unnecessarily.
Great article Howard, couldn't agree more! We first need to start with understanding why we need 'data', what insights are we trying to capture which then drives a process of HOW. This creates the opportunity to continuously improve the process thus creating value!