The power of AI is in small data!
We've all been schooled into believing that AI and ML solutions naturally lend themselves into solving "Big Data" problems.
The allure of capturing as much data as possible is strong. And, now that more businesses are experimenting with machine learning and AI, it’s growing stronger. When you aren’t sure what you may eventually need, might as well capture everything, right?
The exponential growth of data is undisputed , driven by the internet of things and connected devices.Reality is data is already big and getting bigger, but do we need to worry about all this big data upfront even if it is exciting?.Is there anything wrong in thinking small first?
Can humans have more time in the day by allowing machines to solve problems based on small, but reliable data?
A case for small data
Big data can be unwieldy, expensive to maintain, clean and understand.Most small to medium enterprises would struggle to conjure together the technology and people resources required to process big data to the point that it becomes valuable.
It's also hard to see through the human biases, intended or unintended, in big data that can be dangerous when building the machines for the future.
The other reason to factor in is that data is changing at a rapid pace, so data can quite quickly become very irrelevant.
Just like you need to learn to walk before you can run, you can’t really do big data right until you master the art of harnessing small data first.
So then, what is small data?
Small data results from the experimental or intentionally collected data of a human scale where the focus is on causation and understanding rather than prediction.
Small data is much more manageable, and devoid of the high costs (not to mention compliance and regulatory risks) of big data, which can require a massive amount of work to manage, maintain and keep clean. Small data, even if it comes in unstructured form, can also be labeled somewhat easily.
Why AI and ML models might be more powerful with small data?
With big data we are not sure what model to use and we are uncertain about the biases and errors in the data. If we were better able to incorporate these in our modelling we would achieve a more realistic result. But this is difficult to achieve. On the other hand,small data models are necessarily simple and reflect at least some uncertainty. We know about the dangers of model misspecification. Although the results may not calibrate the uncertainty perfectly, at least the user of the conclusions will understand that they should be cautious and allow for the possibility that they are wrong.
In contrast, models for big data might be fine for point prediction and classification but we struggle to provide realistic assessments of uncertainty. Also consider the problem that big data suggests a massive number of hypotheses with less protection against the danger of false positive results.
Conclusion
The artificial intelligence we build for the future is only as good as the data we use to build them with.Humans make less irrational and unbiased decisions when they have small, but relevant data with the potential to give us relationships and insights, those tiny clues that can uncover big trends, so then why shouldn't we build artificial intelligence with the same relevance that help us with better decision making?
I'm not suggesting that we should avoid big data and there is no place for it, however more is not always better.
Think small!