Big data v's little data - scale or value?

Howard Thomas

Published Apr 5, 2016

Let me predict the future of Big Data……we will have more of it!

Information technology is a self-fulfilling prophecy, and the big data push is simply the latest example of this. If information technology is a good thing, them more of it must be even better.

The psychology of the situation is fascinating. Cognitive dissonance theory tells us that people who believe in something, even when presented with contradictory evidence, will go to great lengths to justify their position. William James’ book the ‘Will to Believe’ describes the power of faith despite evidence, and we are all familiar with the placebo effect, expectation influences subjective outcomes.

As organisations get bigger, managing many small initiatives becomes increasingly difficult, and there is a tendency to look for big solutions. Enterprise Resource Planning (ERP) is a classic example of this. Hence, the appeal of ‘Big Data’ a belief that if all of the available data is captured and analysed it can yield the insights that will solve our problems. But based on what evidence?

The sheer size of such initiatives then puts them into the ‘too big to fail’ category justifying ever increasing expenditure to keep them going. Surely anything that is too big to fail is just too big? Wouldn’t risk management suggest that risk should be reduced by breaking the big entity up?

Is there power in analysis? Of course there is! We learn by modelling problems and testing hypotheses and increased processing power allows us to test bigger data sets with more experiments.

Is there power in capturing, organising and protecting data? Of course there is! There is no performance without measurement and information.

An important tenet of good analysis is staying within domain. Staying within the boundaries of the data and not extrapolating, and staying within the limits of your expertise. How can you frame meaningful hypotheses if you don’t understand the subject matter?

Similarly, it is naïve to believe that capturing all data means old statistical sampling techniques are obsolete, or data analysis produces uncannily accurate results if all data has been considered, that it is passé to fret about what causes what, statistical correlation tells us what we need to know and scientific or statistical models aren’t needed. Or simply put, to quote The End of Theory a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves’.

The value in asset management lies not in huge scale, but in breaking up the big data into small data. Where good models based on first principles within the limits of both the principles and the data yield valid insight that is repeatable and sustainable. We then adapt the models on an ongoing basis as we learn, and in time this is the promise of artificial intelligence, but until then we are dependent on human interaction to help the models ‘learn’.

Gary Kasparov wasn’t beaten by Deep Blue, he was beaten by the humans that adjusted Deep Blue’s algorithms. They didn’t learn by studying every board game ever played. They didn’t even learn by studying every chess game ever played. The initial ‘opening’ library (domain knowledge) was provided by grandmasters, before running learning experiments using massive parallel processing, similar in concept to Hadoop today, and then fine tuned by another grandmaster between games against Gary Kasparov.

So if the value in big data is in breaking it down into small data and experimeting, why don’t we simply aggregate the learning from the numerous small data initiatives we have in progress every single day?

Dennis Wolfe 10y

Great read. A small amount of the correct data will always be better than teams of rubbish, to often we want more and more analysis but often nothing is done from it. Small data which can be digested easily and quickly is a wonderful tool.

John F. 10y

Very Good article...

Iain Murray 10y

Thanks for this Howard. Big is not always better! We still see persons ignoring risk based approach on building maintenance programs ending up with a mountain of work instructions which stop adding value - they end up costing the business unnecessarily.

Daniel Cahalarn 10y

Great article Howard, couldn't agree more! We first need to start with understanding why we need 'data', what insights are we trying to capture which then drives a process of HOW. This creates the opportunity to continuously improve the process thus creating value!

See more comments

To view or add a comment, sign in

Big data v's little data - scale or value?

Howard Thomas

More articles by Howard Thomas

Others also viewed

Data Management? Take it with a bit of philosophy

Basic Data Management as a Fundamental for Continuous Improvement

Data Quality Frameworks: Ensuring Clean and Reliable Data

What is the impact of a poor data culture?

When C-Suite Leaders Partner: The Chief Data Officer & the Chief Operations Officer

Data Is Not A Product - It Is A PRACTICE

Designing Data Quality for Business Impact: A Case Study and Modern Reimagining

Master, Reference & Static Data – The Unsung Heroes of Data Modelling

ORM as a Strategic Catalyst

Data Quality Assessment: A Complete Guide to Building a Reliable Data Foundation

Explore content categories

More articles by Howard Thomas

Learning to Share and Be Authentic

Whatever Happened to Critical Thinking?

Three Years to Do What I Can

Five Years Later

Cleaning Up Our Act

Function not Form

We have the knowledge, use it well

Intelligent Decision Making

The Paperless Project

Breaking the Cycle