Combining prediction accuracy and explainability
Open any data science forum today and what jumps out is the unprecedented discussion around Deep learning and Artificial Neural Networks. Deep learning is touted as absolutely essential for data scientists to know, critical for firms to leverage to solve their common business problems, and as the harbinger of the next wave of competitive advantage. Courtesy hidden networks’ complexity, non-linear relationships, and reinforcement learning, prediction algorithms are now moving towards higher levels of accuracy. In this glitzy world-view dominated by complex models, running the gamut from model-agnostic to model free methods, this article attempts to reiterate why simple insight generation and parametric models still have their place in today’s world. And why the AI wand is incomplete in its magic until it works really the way the brain does — by combining prediction accuracy with explainability.
To understand the power of insight generation in explanatory models, we need to step back a bit into history. Data analysis or ‘quantitative research’ evolved as a mechanism to grapple with large amounts of data. Most of the historical analysis in the quantitative field were around measures of central tendency –mean, median and mode, along with higher orders of moments such as variances and skewness. The search was on for a single number (or a set of numbers) that would describe the data and look at it in terms of how frequently and with what intensity something occurred. If the distribution of data could be understood and replicated, then it was possible to compare distributions, understand the effect of variables on each other (correlation and co-variances), measure the impact and importance of these variables through linear regression, beta coefficients etc. Anyone who has undergone a basic statistics course in the last 20 years will relate to the importance accorded the Central Limit theorem.
The entire statistical/quantitative analysis machinery that was geared towards aggregation of data measures led to creation of linear models which were explanatory in nature (‘y’ is correlated to x1, x2, x3) and built on the principles of Best Linear Unbiased Estimates. Transformations or partitions of data sets were utilized primarily to make data more acceptable to the limitations of our modelling algorithms (clustering to reduce heteroscedasticity, principal component analysis to reduce multicollinearity and so on).
The primary analysis methods of this era were driven by costs of data collection, storage and processing. Data had to be collected through surveys and other costly methods; research design became important to collect the most reliable and valid data, and to assess suitability of the same data set for generating multiple insights. Entire industries were built around codifying the data and analyzing it; since data processing required hardware and software investment, linear methods (with some acceptable alternatives for non-linearity thrown in) became the paradigm.
Over the last decade, these limitations no longer hold true. Social media generates unprecedented quantities of data; easy availability of smart phones and tabs has made survey data collection instantaneously machine readable; storage costs for data cross barrier after barrier in their race to the zero; Hadoop and similar cluster mechanisms have reduced ‘data processing’ limitations to a largely theoretical discussion; cloud computing has rendered ‘on-prem’, once a fancy term, an outdated fad. Real-time data analytics is on 24X7.
Following are the five distinct reasons why deep learning has changed the way Analytics works: The natural progression is towards larger data sets; population-wide analysis; models built on partitioned sets; unit level analysis based predictive modelling.
1. Progression is towards larger data sets — Round the clock connectivity and scores of touchpoints to gather information has made available ever-increasing data sets. As larger data set lend themselves to higher modelling accuracy and this natural progression is expected to continue/
2. Population-wide analysis — Accessibility of population wide data that can be analyzed has fueled the need to look at data analysis algorithms differently — when the entire population CAN be analyzed, there is no merit in carefully choosing a sample. This has changed the nature of the algorithms we use in our data analysis. The parametric or distribution-based paradigm used the impact and importance of variables (using their coefficient estimates and their t-tests and user-defined confidence intervals) to figure out, say, how to identify the winning horse in a horse race — based, for example, on strength, endurance, speed or other factors. The current paradigm around machine learning and deep learning uses large amounts of data to figure out exactly which horse would win, rather than assessing the components that make up the prediction.
Understanding ‘why’ a certain event worked the way it did is only a stepping stone to figuring out ‘which’ event would succeed (or fail). Our earlier sample selection mechanisms were dictated by our limitations in collecting and processing large data sets. When this cost of collection and analysis comes down, decision makers would naturally gravitate towards using all available data, in some cases the entire population data itself.
3. Models built on partitioned sets — Processing power, which is very accessible and affordable, allows us to juxtapose multiple algorithms that work for very specialized sub-sets of the data. The more we are able to ensemble layers of non-linear models that work accurately for smaller subsets of the data, the more we move from a single model that fits the data ‘within a certain confidence interval’ to very fine-tuned models for each partition. While traditional regression models used to optimize errors at the overall model level, deep learning models reduce errors for each sub-space, sometimes even for each individual output node, combining these paths to reduce error at the overall level.
4. Unit-level-analysis based predictive modelling — We now go from the generic one-size-fits-all model to specific sub-space models that are ensembled into a whole. This requires large data set that are complete, tagged for supervised learning (because the models learn from the data itself and from the errors they have made). The expertise of the model depends on the quantity of data- if every conceivable scenario that could occur has occurred and pattern identification converges with higher experience, then there are low surprises and the models perform better than even their human counterparts.
So, in a nutshell, predictive algorithms that use machine learning and deep neural nets have overtaken the world.
However, humans do not think only in terms of prediction. The ability to rationalize a decision and explain it ‘in simple terms a child can understand’ is the hallmark of a great human being. If so, Artificial intelligence and deep learning, in spite of their ability to crunch data and predict an outcome, are still not easy to understand in terms of ‘what caused what’. If one were to argue that statistics evolved as a response to the human minds’ necessity to know ‘what caused what’, then AI and deep learning engines have only moved us further from that goal.
While businesses may work with predictive algorithms to determine (programatically led) voice commands analysis or next best action generation, there are some decisions where there is a solid rationale that is required - for instance, decision making in terms of which candidate to hire, which market to penetrate and how much price to reduce still needs a rationale for action that AI is yet to provide. An AI engine that cannot ‘talk you through its thought process’ is limited to compiling data and explaining features rather than taking a seat at the boardroom table. In these cases, obtaining the right output is only the first step: being able to explain this decision is the key to trust.
I’d like to leave you with one story from our experience of working with a leading service provider in the who engages independent contractors in a service- aggregation model. We were tasked with identifying which of their contractors would do exceptionally well and which of them would fall short of their objectives. The machine learning models we built did what they were supposed to do — identify top performers, middle performers and laggards. But the client found something disturbing — there was no feedback they could take to their contractors or to the training department in terms of actionable points for intervention. We eventually matched up the output with a software-generated decision tree algorithm that had human-readable rules, and traditional parametric models that identified the impact and importance of the various demographic and psychographic attributes of the contractors. In this situation, our client was ready to trade off accuracy for interpretability, and chose to go with models that could explain their intuition.
This anecdote shows the value of interpretability, which is where the big guns of AI/ML are blazing away now. Explainable AI (XAI) is the new frontier, and very soon, we should have AI programs that produce actionable models which are highly accurate in their prediction, along with the ability to ‘talk us through’ the solution. Trust will be a hard-won commodity in AI till then.
This post first appeared in Medium at https://medium.com/@publishing_13320/combining-prediction-accuracy-and-explainability-c09d6018fc09
Very nicely written, Madalasa.