The human limit of machine learning.
The other day a discussion at work resurfaced the old topic of the merits and problems of advanced recommendation algorithms. The problem has many subtle angles, but in its simplest form it boils down to our better relevance algorithms being completely obscure to us about the "why".
The issue is not new and has been topic of conversation among us for a while. But this time I saw a new facet to it: there's a wall on how far we can take our relevance algorithms, before we start seeing negative returns.
It has nothing to do with limitation of the methods, or with how smart we can be on analyzing the data. It is not about the limited amount of training data we can gather, nor it is about having the right information to put forth.
It is about the human being consuming the data.
Machine learned recommendations algorithms are fantastic, they can take into account tens of millions of documents, consider our identity, our history, our tastes. They can layer on top the "wisdom of the crowd": what our friends, acquaintances, professionals in the same industry are reading. They can check the weather so see if we're more likely to be interested in alternative commute routes, guess our portfolio to gauge interest in business news, etc, etc, etc.
They are just not good at doing it in a human way. Their motives and logic are generally too alien for us to intuitively grasp, and their inner workings are difficult to translate succinctly into a phrase or two. And there exactly, lies the problem. The ultimate goal is not to give you the information,
The goal is for you to be able to act on this information
And for this you have to understand it's significance, in the appropriate context. You need to be able to understand, catalog, and archive that piece of data, and to thread it with other facts/concepts/opinions before it becomes actionable and useful for you.
A little context about the work we were doing is necessary: our goal was to deliver, through smart algorithms, the news, events or other simple data nuggets the user needed to know to be more productive. And in this context, the "Why" is at least as important as the "What".
Just to work out an example: let's assume we have an all knowing genie hand picked an article for me. The article is (superficially) about a small startup in SF boosting up their employee perks. I never heard of this company, nor I have any acquaintances that work at it, and on top of it, I'm not looking for a job. More likely than not, I would briefly skim over the first paragraph, or skip the piece altogether.
What I don't know is that deeper into the article, there's an insight on the circumstances leading to the benefit changes: other startups have lost employees to a growing living expense of the city of SF, switching to better paying options, or moving to a different place. This company was just trying to be ahead of the problem.
An there was my piece of useful info: growing living expenses in SF, startups slowly catching up. Multitude of actionable items: check my own expenses, surf my social network for hiring opportunities, etc, etc.
But I didn't get it: the genie was smarter that I was...
The problem in trusting AI is in my opinion that AI is too married to optimization in the sense of finding the best parameter fit for one quantitative representation of the problem where as humans rely on mixed feelings to look simultaneously at different aspect of a same problem. That is iHuman wants to look at a comprehensive yet minimal set of efficient trade-offs before I make any important decision. An illustration of the concept of Pareto ranking for medical insurance plans sold on CoveredCA can be found here: http://www.theinsuranator.com. The ranking is done in real-time and is based on common scenarios. It is a simple demonstration tool that costs nothing to use and collects no data so I think that it is fair to share on LinkedIn.
I believe this blog is about the Principle of Sufficient Reason formulated by the mathematician Leibniz: http://hans.wyrdweb.eu/the-limits-reason-about-chaitin-and-leibniz/
As I see it, some of the really fundamental problems with machine algos is the fact that they all make calculations based on final results. The problem is that every final result, for example: "I chose to watch this movie" or parts of it, is that the reason for my choice is not a part of the result: the movie I chose! So a vital part of my decision making progress is left out - making the algorithm and machine results being non human. I belive, untill above problem is solved, we are stuck with machine made choices made from basically
You probably have heard about the diapers-and-beer example, right? It has been described ad nauseam. Well, let's have a closer look. The analytics identified a correlation between the sales of diapers and the sales of beer, which seems odd at first glance. But what the story doesn't say is that, I presume, it was human analysis that determined that mothers sent their husbands in the evenings to buy diapers who would buy beer at the same time while they were at the store. The latter part is the "why". This supports the notion that computers don't have an effective ability to figure out the "why" by themselves; they need humans to investigate that aspect. In other words, computers can tell human operators, "hey! There's something to investigate here". Figuring out the "why" requires deep knowledge and common-sense knowledge; didn't the first wave of AI research make attempts at this with little success?
Very interesting article... It feels like you really turned the problem on its head here. I feel that the takeaway goes back to the old adage: don't judge a book (or algorithm) by its cover.