Intelligent Automation using a Machine Learning Technique
What is this article about?
Let's start with a simple question. Are Artificial Intelligence and Machine Learning techniques for high end tasks only? If i am not doing any high end tasks, i have no value in deploying techniques from these fields? Well, i agree that ultimate goal for these fields are creating agents with intelligence at per human, in the context of performing a specialized task. However, simple techniques from these fields can help in automating tasks that need little human intelligence with not so complex and costly developments.
In this article, i first describe a fictional task that requires little human intelligence (may sound familiar to you). Then i explain classification technique of machine learning and how it can be used in automating this task.
Problem Statement and Goal:
There is a team to answer queries it receives on it's email id. People in the team, read emails and reply accordingly.
Queries are human written and answers too. We need to design a system that answers queries appropriately. In some cases, as part of answers, a document from a set of documents is also attached. In fact, this attachment answers the query perfectly. Selection of a document from the set to attach is a human decision. If we merge all documents to make one and attach that every time, well it solves the problem but not acceptable. So, we have to have this set of documents. There are also some replies that are written by human without any attachment.
We need to design a system that selects a document as per email received and replies. For the cases, where a human reply is preferred, it refers query to human.
Classification – A machine Learning Technique:
Classification is one of the fundamental problem in machine learning. in this, given an object we need to find which class it belongs from a set of pre-defined classes. There are many machine learning techniques that solve such problems.
This is exactly what we need; we need to decide which document to attach. we need to carefully decide classes for our documents. And one more for referring to human.
Training classifier (Learning happens here):
We must have historical email queries and their replies. In fact, we have many data-points with input and output if we think of the team as a system. Also, there are many machine learning algorithms for classification problems. We will use an algorithm and train it with our historical data. What we will have as an output of this process is a model (a trained classifier).
Testing Classifier (Measuring accuracy):
How good our model is? We have to keep some historical data separate from training data. This part of data was completely unseen by classifier when it got trained. For this test data, we know what is input query and what should be its reply. For each data point in testing data, we can compare what it should be and what our trained classifier is giving. Thus we can have accuracy of trained classifier.
We can try some more algorithms, repeat training and testing. We can choose an optimal classifier. It should be noted here that it is not necessary that best accuracy is indicating best classifier. We need to look at it in the context of domain to which problem at hand belongs. We need to be very realistic in our approach. There are many traps here. And we may fall in it. Data science is an experimental science. None can say that this is best algorithms for every problem. There is no such thing. We have to experiment. We have to consult domain experts about accuracies.
Deploying classifier (in the real world):
Once we are convinced that we have found best suited classifier. We can deploy it to use. There is another practical approach called ensemble learning in which we deploy more than one classifier. And combine predictions from each into one final decision. For example, we can have voting kind of thing in place and selecting majority predicted class.
Disclaimer:
This article avoids all jargons and technical stuffs. It is written with an aim to present how we can apply classification in automating a task with trivial human intelligence needs. This example is toy one but captures the essence of applying machine learning techniques in the real world applications.
However, things in machine learning are not so trivial. It needs a good knowledge of computer science, mathematics and the knowledge of the domain to which problem at hand belongs. Team should have these capabilities before embarking on the journey of automating tasks that needs human intelligence.
Thanks for reading. Your corrections, improvements and comments are whole heartily welcomed.
Hi sir
It is very nice reading and puts solution in a very simple way. However, i would like to extend this post by mentioning some of the classifiers and evaluation methods; Classifiers : Support Vector Machine, Naive Bayes Decision trees, kNN etc. Selection of one or more ( in case ensemble learning ) depends on data. Testing Methods: N-fold cross validation, split etc.
Nice one