5 Secrets of Machine Learning in Application to Chatbots
This article is specifically focused on deep learning (neural networks) based machine learning (ML) algorithms in application to chatbots. There are 5 important secrets that some developers may not be aware of.
1. Evolutionary Learning versus Machine Learning
Developers are always attracted to simplicity. They all want that a ML algorithm should be able to learn "anything" by loading data and pushing a button. It is a fantasy that assumes an ocean of unorganized starting neurons can learn anything. In reality, human dialogue skill has an evolutionary component, a pre-defined starting structure, that formed using "infinite" amount of trials over "infinite" time (for all practical purposes). When we start learning to speak, we build on top of that initial biological neuron structure. This structure is mostly unknown to date, but the evolutionary structures are obvious by the segmentation of activity regions in the brain. Then, the question is "what initial structure should we assume to build upon using ML?" If you are not asking this question, your chatbot application will be very shallow. For example, the early version of IBM Watson that played the Jeopardy game had the simple structure of "fill in the blank", thus could do this job with ML and reached a high level. For a reasonable success in chatbot development, your ML approach must be based on some initial cognitive structure. If you use APIs for ML, then you should investigate further into what initial structure they are assuming while providing this API.
2. Data will Always be Limited
The combination (or permutation) space in natural languages is so vast that no data set can entail all possibilities. This fact is often overlooked by the users of ML especially if they have an engineering background. In engineering applications, you can actually obtain data sets that would capture most relationships (even nonlinear). Some aeroplane auto-landing systems, for example, are built using ML because of the availability of data (through test flights and simulations). But this is not the case for natural language processing. You can train your system using the complete works of Shakespeare, yet your system may still not understand "Do apples grow on trees?" Pure data oriented ML application to NLP will only work for a very narrow vertical subjects.
3- Knowledge versus Language
Another confusion in the ML world is the difference between knowledge and language. NLP mainly deals with detecting the meaning embeded in a sentence via parsing . Even if you have the most sophisticted NLP, what do you do after the detection of the sentence? There comes the crucial part of a chatbot system: Knowledge Representation (KR). You need to have an effective KR system that can respond to the detected sentence via appropriate response. Once you know your KR structure, then ML can be used to map detected sentences to relevant knowledge. Of course, for this type of ML usage, data will be much more challenging to come by with.
4. Dialogue is more than Language
Human dialogue is a set of rules of behavior that drives conversations. If your chatbot ignores this dynamic component, it will become a single-step response machine, not able to guide the user via asking questions or offering options. Facebook's recent messenger chatbot development platform offers good tools to deploy a guidance logic. However, this guidance logic must come from a human behavior modeling. A ML system can be used to bridge the NLP and KR aspects of chatbots for each defined guidance pattern. For example, if you are selling products via your chatbot, your ML strategy must focus of "sales dialogue."
5. Use of Fuzzy Logic in ML
Fuzzy logic can be applied in various ways to fill gaps in data, gaps in KR mapping, and NLP detection. For example:
Apples grow on trees -> Apple=Fruit -> Fruits (may) grow on trees.
With this ontological operation, you can create a new association that may not exist in your training data set. As a result, you can expand your limited data set several times fold to expand its coverage. This is just one simple example, and there are dozens of other ways to use fuzzy logic.