Big Data, then Big Information, then Big Decisions
It was about 4 years ago I first heard about Big Data and 6 months later about Internet of Things. I've heard about Data Science on and off for longer but it's only become popular recently (all this in my own biased field of experience of course). This seems a natural progression. First came the new widespread availability of heaps of data (BD) brought about by developments in technology. Then came the ability to measure much more than we could before (IoT), generating heaps more data, and now there is a huge market for people to extract insights from the data (DS). This rapid succession gives a strong indication of where these developments are heading.
At present, there is a huge market for people with the skills to make sense of data. If there is a lot unknown about the system that the data relates to then there is a lot of valuable insight that can be gleaned from simply summarising the data in different ways and applying conventional data analysis techniques (regressions, fourier analyses, PCA, etc...). The two main things people are after are useful predictive accuracy and insight into how the system works.
As value continues to be extracted from understanding systems the black box methods will tend to be replaced by methods that account for some of the understanding of the system in question. A car production plant does not just use black box methods to understand how variation in the supply of tyres affects their productivity - they know with very high accuracy (but not perfection) how such variation in supply will propagate through the production line. This knowledge about how the system functions is information. In the past such information resided in the heads of experts and books etc.. but increasingly they are being made available as implementable abstractions (models). We're already beginning to see the start of what I believe will be a huge market for the algorithms that turn "data" into "key information", for example algorithms that turn raw satellite data into different "products" such as the most likely land cover type form surface reflectance data.
The transition from "big data" to "big information" will be mature when we have a healthy ecosystem of information sources pertaining to multiple aspects of many systems, such that the problem for the analysts is not just extracting information from data but figuring out the key information that brings the most value to the problem at hand. We'll be facing the problem of "information overload" rather than "data overload". For example, if we're aiming to manage motorway response crews and we have information on the likelihood of possible flooding events, the risk of structural problems to roads, the chances of dangerous driving incidents, the chances of high winds, the chances of congestion, then we face the problem that all these things are relevant and we really should take them all into account. How do we do that? Key to solving this issue will be obtaining a systemic understanding of the problem at hand, so that all the different information sources can be contextualised in relation to each other. By that I mean an algorithmic representation of how a substantial portion of the system of interest functions so that we can see the relative importance of different aspects of the system to what we care about, be it the chances of an accident, to profits, expenditure, or congestion. Of course, this capability is already present in many domains, but I think we'll see such techniques used much more widely in the era of big information.
I think we'll then get to a point when we have an ecosystem of available representations of how different systems function and respond to different sources of variation. Then we'll move the problem from having too much information to having a difficult to manage quantity of options about how to intelligently manipulate the system. I think at that point we will move from the era of Big Information to Big Decisions. Characterising that era will be lots of examples of organisations having many options at their disposal and many desirable outcomes but a difficulty in navigating that option space in order to achieve the best outcomes for them. Computational techniques to explore that option space, exposing multi-dimensional trade-offs and assessing possibilities in relation to a wide range of key performance indicators, will become hugely important in the era of Big Decisions (e.g. Constraint Reasoning techniques). There will be a huge growth of interest in generic decision support software too.
Of course all that I've described are already being done in many domains and is done in the heads of experts every day. What is different is that these things that are the preserve and specialism of a precious few is going to be accessible and implemented by a much larger number of people and organisations. So what does that mean for the future of predictive analytics? I think any company should be expecting these changes to come and not simply think that it ends at the analysis of Big Data. I think any business should be laying down an evolution strategy and timeline that takes them from seeing information about their system of interest in an unprecedented amount of (currently poorly understood) detail, the era of Big Data, to identifying the best decision to make in light of myriad influencers on their business over a wide range of timescales. Being first to solve these problems will not only improve the business but will also reveal methodologies that could prove valuable more generally.
For the next generation of Data Scientists. I think you should be aware of how the field is likely to change and how you best fit into it. I anticipate a huge surge of computationally skilled domain experts in years to come as companies start to prefer those with data science techniques AND domain understanding to just data science skills. Then will come interest in those who understand how to implement computational support for decision making.
This must be the most exciting time in history to be a predictive analyst.