The 'explainability' problem in data science and how it affects communication and product innovation

Being in the business of making machines behave cleverer towards human needs, I had the opportunity to experience how the business and software engineering communicate over almost 20 years now and see it’s about to be radically changed.

 Yes, there always was this “techie” language disconnect between the 2 groups that required someone to translate. But the recent changes seem to be almost impossible to bridge. With almost no warning time, there is a total disconnect in terms of thinking about business problem and how it can be solved with data, and this disconnect cannot be solved anymore by someone translating between the two groups.

 You may think now, that any technical solution can be translated back to the business, and of course it needs to be, because in the end everyone in business needs the reason to believe that some solution will work, based on very high level of understanding. But this is about to change radically - let me explain why it is not possible anymore this time. And maybe, if we deal with it the right way, it may even result into something fascinating and innovative.

 What is the emerging communication problem?

The problem with this language and technological basis disconnect is very concrete: it makes explanations for business requirements and especially how they are satisfied very difficult, and also limits our ability to innovate and create use cases which previously did not exist. In essence, software engineering and especially data scientists cannot explain anymore how they solved a particular problem to the business, and the business of course will not understand anymore why and how a solution works.

This might result, if not solved, into us doing the same things as we did 20 years ago in terms of intelligent applications and not capturing the benefits of the AI innovation wave, because many product managers don’t have an abstract tool-set of “how use cases are solved” anymore.

 They just don’t know any more what’s technically possible.

 What is the underlying problem for this clash?

In recent years, with latent representations and especially deep learning algorithms, what we have undeniably experienced, is a strong move away from deterministic rules, which inherently expect explicit representations. Radically simplified, in latent representations, we have moved away from: “A thing is a real world entity, and it’s probably in relationship to one or more other things”, and when 2 “things” are connected, this means something explicit which we can act on with comparably strict and explicit rules.

However, on the other side, on the latent semantics and distributed representations side, a "thing" is a probability distribution over meanings, that may or may not be biased to the semantics the user associates with them in their current and subjective, local context.

Even worse, since the semantics try to mimic real, human semantic representations, we are taking an actual look under the covers how meaning is represented in our brains, and that is something we don’t even really understand nowadays, let alone can model into the 'electronic micro brains' (neural nets) we are building.

We cannot easily make the meaning explicit and biased to a certain context again, without losing the benefits of the distributional nature, such as robustness against human language ambiguity etc. In our brains, the method that understands the representation of meaning is wired to our own brain-internal classifiers and then to the functions that form words to convey the meaning, and even this process often requires a lengthy conversation between two human dialogue partners. And that is not what we expect from a computer. We want a clear answer that matches our own subjective meaning classification, formed over our entire life, based on all our experiences. Explicit and simple “meaning” is what we expect.

 So, there you have the problem: we want to be able to explicitly understand the meaning of data, discuss it and explain the schema to the business and the product managers, but we don’t want to lose the robustness and highly distributed meaning representations which makes the data so rich and useful.

This leads to the well know explanation problem in machine learning: in data science, we cannot tell how and why the machine comes to a certain outcome in detail, and the business can only say they like a result or they don’t.

 What does that mean in communication between data scientists and stakeholders?

Now, if you are, like me, in a position where you must explain to stakeholders how the technology works (and hence how data is represented), you are struggling, because it’s not possible to have an easy conversation over a 400-dimensional vector representation of a business problem.

Even worse, the dimensions of this vector may even not be just from one single concept like for instance words and sentences, but also from categorical metadata and, for instance, from a vectorized graph database structure. All of these elements can be tied together in one vector easily, and it can be a fantastic model for your prediction problem, you just cannot explain to your stakeholders WHY it works so great – they cannot read from your explanation that if they tuned the raw data to have one or the other additional metadata, that this might enhance the prediction and subsequent reasoning. So, they basically must live with the fact that the model, as it is, works great (or not), and when they want to give the prediction a slightly different twist, its maybe a problem of completely rebuilding the entire method. From an alternative standpoint, one could say that a vector dimension is not the analogy to a rule in the deterministic rule-based world.

 Is there a solution?

I would argue that it makes sense to throw over board old behaviors of wanting to understand explicitly why and how one data model works better than another one. The ultimate reason for having a model is to have it make great predictions, not to have discussions on the exact model setup and trying to find an explanation why it works/doesn’t work. If we want to enhance it or make it work, we anyways most likely must start from scratch (almost) to be able to do reliable quality tests.

We don’t want to go back to super complex subject matter expert tuned systems like we had “back in the days” for things like translation, although those were maybe easier to explain. If you added one rule, that meant that the rest of the system and its predictive power wasn’t as much touched.  On the down side, that benefit was contrasted by super long development times (thousands of rules and methods had to be implemented) which far exceeded certainly the cost benefit of one rule being cheap to implement, let alone that distributed representations lead to far better quality by now.

We should let it happen, work with these new representations, be fascinated by the outcome when they work, and think about new ways of doing things, instead of ending up with strange hybrid models which try to be understandable and distributed (and thus not understandable) at the same time.

This is just holding us back from producing tangible value.

The limits of methodological reductionism are being shattered by the emergent properties of intelligence. This is our new paradigm. Some call it a black box, we call it intuitive fidelity!

Like
Reply

To view or add a comment, sign in

More articles by Martin Rueckert

Others also viewed

Explore content categories