Pushing Forward Artificial Intelligence to Learn how to Make Better Decisions in the Presence of Uncertainty
Prof. Dr. Michael Feindt, Founder and Chief Scientific Advisor, Blue Yonder GmbH
Blue Yonder’s Philosophy and Algorithms
Blue Yonder is known for its transformative data and machine learning driven services that deliver the best decisions for regularly repetitive tasks in retail and elsewhere, such as in product replenishment or pricing.
Our philosophy consists of the following pillars:
- Scientific rigor and data driven approaches are the cornerstone of Blue Yonder’s approach to all ongoing and new products.
- Predictive analytics: Sophisticated machine learning and artificial intelligence methods are used to predict complete probability density distributions for future events. The prediction is not just a single number but also includes, for example, measurements of the anticipated volatility. This can be used to evaluate the risks associated with any decision based on different scenarios.
- Prescriptive analytics: optimized decisions are derived from the predicted probability distributions, potentially taking further operational constraints into account.
- Automation
We have developed a large number of algorithms (including NeuroBayes and Cyclic Boosting) and software to deliver value for our customers in a way that is fast, robust, reliable and highly automated. Billions of predictions and best decisions are delivered daily. While rolling out our current supply chain replenishment and pricing products with large success and benefit for our customers, our data scientists are already at work on the next generation of Artificial Intelligence algorithms.
The Future Direction of Our Artificial Intelligence
Machine learning and artificial intelligence are currently a hot topic and significant research efforts are directed toward this field. Our future work focuses on the following:
- How to find and optimize decision policy strategies where computing exact solutions with classical methods is prohibitively costly or completely impossible.
- Speeding up complex and nested decisions so that they can be performed in real-time.
- Finding and isolating causal relations instead of just statistical correlations — this is important if one wants to change action policies (for the better).
- Optimizing self-learning (reinforcement learning) algorithms for extremely stochastic environments.
In pushing forward on these topics we intend to keep and extend the competitive advantage of Blue Yonder decisions. We want to be able to deliver even better decisions, especially for the very complicated area of Algorithmic Merchandising, which, according to Gartner, represents a new era of retail.
The retail sector brings unique challenges that need to be addressed in order to develop the next generation retail applications: Actions by purchasing, supply chain/replenishment, pricing and marketing departments are often disjointed in today’s organizations. Our preliminary investigations show that unifying areas such as pricing and replenishment decisions has a significant potential to increase both a company’s profit and reduce the risk (or uncertainty) on the season’s result of single articles. However, approaches like this require major organizational changes as well as the development of novel machine learning techniques: Combining our strengths — predictions based on complete probability density distributions with reinforcement learning techniques and deep learning — shows very promising results in our early investigations.
Artificial Intelligence’s Super-Human Performance in Go and Video Games
Google’s AlphaGo recently made the headlines by beating the world champion in the ancient game of Go. Unlike chess, which can be solved by brute-force approaches, Go is significantly more complex and poses much more of a challenge for players and programmers alike. AlphaGo is based on deep neural networks and more conventional approaches such as tree searches. It was initially bootstrapped by human game-play, then trained by playing against itself before taking on human champions: First the European Go champion who, after his defeat, joined Google’s AlphaGo team to help bring the system to the next level. This culminated in the victory against the leading world champion. Due to its high complexity, developing some sort of “gut feeling” was the only way for the machine to win – and this worked.
Video games are another area where computer algorithms have bested human players. Those who play computer games are familiar with non-player characters who interact with players and their environment that have become increasingly realistic in recent years. However, there is a different area where computers beat us: Taking the human’s control seat and playing on our behalf, like in high-speed racing arcade games. Reinforcement learning techniques teach “agents” (i.e. computer algorithms) how to control the game optimally just by playing the game and learning from the experience – not unlike we do when we press all the buttons to see what happens… They just happen to be significantly better than we are.
The Way of AI into Real Business :
Operations Research by Artificial Intelligence OR-by-AI
Reading about these successes begs the question whether if these approaches can be used to optimize business decisions. This is exactly what we’re aiming to do: Solving general problems in Operation Research by artificial intelligence, or: OR-by-AI, based on NeuroBayes, deep neural networks and reinforcement learning. This combination has the potential to solve challenges that remain too complex to be solved analytically and are currently handled by heuristics.
First results are very promising. Let’s look at some examples:
Example 1: Learning “Gut Feeling” for the Results of Complicated Mathematical Calculations
Replenishment of perishable goods is one of the main challenges for modern retailers. In the case of articles with a shelf-life of two periods, also known as the 2-period Newsvendor problem, is particularly interesting as it has many of the characteristics found in most practical applications while its complexity is still relatively low to allow solving it with conventional methods such as dynamic programming.
One of the key challenges is that since the shelf-life of the product is longer than one day, each day is influenced by the previous days due to the overlap in inventory. This means that the best decision today will depend on what you will do at a later time as well as the stochastic customer behavior, which is taken from predictions in the form of a probability distributions.
As part of our research, we have trained a deep neural network on a large number of calculations obtained by conventional approaches calculated by dynamic programming, covering the complete relevant phase space of possible actions and outcomes. The resulting deep network was able to learn the best solutions from this input data. The trained network can then be used to predict the optimal decision for a single case – compared to solving the same task with dynamic programming, the same results can be obtained a 36,000 times faster.
This novel way of teaching artificial intelligence a sense of “gut feeling” has an interesting analogy to the human brain: As humans our decisions are made by two competing systems (see Nobel prize winner Daniel Kahneman’s book „Thinking, Fast and Slow“):
- The fast, emotional (“gut feeling”) system (working autonomous, mechanical, unconscious) used for almost all of our decisions
- The slow, rational system (needs deep thinking, costs more energy and is not very often used in everyday life, often only as a posteriori justification system)
In this application Artificial Intelligence mimics the human brain in order to be much faster and (CPU-, time-, energy-) resource-saving by deferring decisions from the slow rational system into the fast system. However, contrary to the human brain, where the fast system was shaped by evolution to guarantee survival in a hazardous environment and does not understand statistics and uncertainty, our Artificial Intelligence system can learn a “gut feeling” even for complicated statistical problems.
Example 2: Learning to Find Solutions for Still Unsolved Mathematical Calculations
Further extensions to the Newsvendor problem are much more challenging. The n-period-newsvendor problem with n=3,4,5,... is a good example for a problem that is not solvable with classical methods in reasonable time, as it gets exponentially more complex with increasing n. However, there is a solution for the limiting case n—> infinity, the so-called (s,S) or min/max-strategy.
Our current research is focused on designing artificial intelligence agents to solve this class of problems through reinforcement learning. This is similar in spirit to Google’s AlphaGo. The problem is far too complicated to try all possibilities, due to the huge combinatorial options originating from the highly uncertain customer behavior tomorrow, the day after etc. and of the decisions today, tomorrow, the day after etc.
So our Artificial Intelligence agent has to learn a better and better action policy depending on a high-dimensional input space, a “gut feeling” parameterized in a neural network that is optimized for such stochastic systems. It will use reinforcement learning to constantly improve and suddenly be better than today’s best-known heuristics.
Why do We Explore these Areas of Research?
At Blue Yonder we are convinced that this is just the beginning of an extremely interesting and exciting era and we are at its spearhead. Not just on the academic, theoretical side, but also pushing such algorithms into practice, enabling real customers to gain advantage in real-world problems.
The combination of probabilistic forecasting, deep neural networks and advanced reinforcement learning methods will play an important role in optimizing complicated decision chains in highly stochastic environments, in Operations Research, in retail, be it for fresh replenishment, pricing, promotions, marketing, “Algorithmic Merchandising” or elsewhere.
Looking at the extremely rapid progress in Artificial Intelligence, it will probably not take very long until such decisions are super-human in quality and speed and will completely dominate the scene.
Just to remind ourselves: Today’s best Artificial Intelligence algorithms in Arcade video games are about 6.5 times better than humans.
great article - thanks for sharing :)
good read ! Danke Herr Feindt !