Interactive Tutorial on LSTM (Long Short Term Memory Model)
LSTM networks are capable to model sequential and temporal aspects of data and due to these capacities, LSTM has been used widely for text, videos, and time-series data. The following contains the basic use of LSTM network: (1) Language modeling [4] (2) Machine Translation (also known as sequence to sequence learning) [5] (3)Image captioning (with and without attention)[6], (4) Handwriting detection (5) Image generation (using attention models)(6) Automatic Question answering [7] (7) Video to text extraction.
Actually, LSTM model has been designed to combat the following two major issues of Recurrent Neural Networks (RNN).
- RNN suffers from vanishing gradient problem and it limits the RNN’s effectiveness when it needs to go back deep into the context.
- Similarly, in the case of RNN, there is no finer control over which part of the context needs to be carried forward and how much of the past needs to be “forgotten”
The gated units used in the LSTM (used in the place of hidden layer node in RNN) is very helpful in combating the above-discussed issues. Additionally, it also carries some features from RNN. Like: the use of backpropagation through time, network architecture and so on.
Similarly, there are a lot of topics, architectures and internal operations, which make LSTM highly useful. In this tutorial, I have tried to capture most of such topics about LSTM in the simplest possible way. I have also tried to answer the following questions:
Question-set-1
- Why we need LSTM?
- Why do we say LSTM as gated units based network? What are these gated models? How does It work?
- What are the similarities and dissimilarities between traditional RNN and LSTM model?
Question Set-2
- How forward pass (forward propagation) works with LSTM? Step-by-Step discussion.
Question Set-3
- How backward propagation works with LSTM model? How LSTM trains its gated units? (Step by Step discussion)
- What kind of input, we can feed to LSTM based model?
NOTE: I am also availble at: Youtube
References:
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
- Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
- Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems.
- Sundermeyer, Martin, Hermann Ney, and Ralf Schlüter. "From feedforward to recurrent LSTM neural networks for language modeling." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.3 (2015): 517-529.
- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
- Jia, Xu, et al. "Guiding the long-short term memory model for image caption generation." Proceedings of the IEEE International Conference on Computer Vision. 2015.
- Zhao, Z., Lu, H., Zheng, V. W., Cai, D., He, X., & Zhuang, Y. (2017). Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning. In AAAI (pp. 3532-3539).