Overview of Seq2Seq Learning
There is a lot of buzz regarding Chatbots and every second commercial company is launching or trying to launch on their Chatbot websites and apps. The bigger companies are launching their bot API and providing assistance on social media too.
But in the flock which one we consider an intelligent bot that is intelligent enough to answer your questions. The intelligent models are further classified into two models:
1. Retrieval based model
2. Generative model
The retrieval model is simple one but lengthy to develop because as per the name the model picks a response from provided collection according to the query. It does generate new sentences or do not need any memory as the logic to pick responses are already into the model.
The generative models are the intelligent one, they generate the response word by word based on the labeled data on which the model is trained
Sequence to Sequence Learning:
The Sequence prediction is one of the most difficult problem faced by Deep neural networks. Deep Neural networks are extremely powerful machine learning models; they can be trained with supervised back propagation whenever the training set has enough information. But despite of their flexibility and power, DNNs can only be applied to the problems whose input and target can be sensibly encoded with vectors of fixed dimensionality, that does not happen in sequence prediction since the input and output length are not known.
The dimensionality issue is solved later by the application of Long Shot term memory architecture. The idea involves using two LSTMs, the first LSTM used to read the input sequence one-time stamp at a time to obtain fixed dimensional vector representation. Then the second LSTM uses that vector to extract the output sequence. The second LSTM is a recurrent neural network language model that has ability to successfully learn on data with long range temporal dependencies which makes it a natural choice for generating output of a sequence.
The recurrent neural network is a generalized feedforward neural network. Given in case of sequence learning it computes a sequence of output by passing a sequence of input. The RNN can easily map sequences to sequences when the relation between the input and output are known previously. However, it gets complicated when the length is complicated and not known.
The LSTM estimate the conditional probability for an input sequence let be (x1, x2, .. , xn) and output sequence (y1, y1, … , yn`) given that the length of input and output may vary.
Before training the model on the data, a considerable amount of work need to be done on the dataset, to convert variable length sequences to fixed length sequences by padding. The symbols used are as follow:
1. EOS: End of sentence
2. PAD: Filler
3. GO: Start decoding
4. UNK: Unknown
As we have discussed a lot on variable length of inputs and output problem, this problem is solved by filling the vector by above symbols to attain a required length but handled later to generate a proper output.
Beam Search and Attention mechanism are two topics which I will try to discuss in next part.