Interactive Tutorial on LSTM (Long Short Term Memory Model)

Niraj Kumar, Ph.D.

Published Nov 5, 2017

LSTM networks are capable to model sequential and temporal aspects of data and due to these capacities, LSTM has been used widely for text, videos, and time-series data. The following contains the basic use of LSTM network: (1) Language modeling [4] (2) Machine Translation (also known as sequence to sequence learning) [5] (3)Image captioning (with and without attention)[6], (4) Handwriting detection (5) Image generation (using attention models)(6) Automatic Question answering [7] (7) Video to text extraction.

Actually, LSTM model has been designed to combat the following two major issues of Recurrent Neural Networks (RNN).

RNN suffers from vanishing gradient problem and it limits the RNN’s effectiveness when it needs to go back deep into the context.
Similarly, in the case of RNN, there is no finer control over which part of the context needs to be carried forward and how much of the past needs to be “forgotten”

The gated units used in the LSTM (used in the place of hidden layer node in RNN) is very helpful in combating the above-discussed issues. Additionally, it also carries some features from RNN. Like: the use of backpropagation through time, network architecture and so on.

Similarly, there are a lot of topics, architectures and internal operations, which make LSTM highly useful. In this tutorial, I have tried to capture most of such topics about LSTM in the simplest possible way. I have also tried to answer the following questions:

Question-set-1

Why we need LSTM?
Why do we say LSTM as gated units based network? What are these gated models? How does It work?
What are the similarities and dissimilarities between traditional RNN and LSTM model?

Question Set-2

How forward pass (forward propagation) works with LSTM? Step-by-Step discussion.

Question Set-3

How backward propagation works with LSTM model? How LSTM trains its gated units? (Step by Step discussion)
What kind of input, we can feed to LSTM based model?

NOTE: I am also availble at: Youtube

References:

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems.
Sundermeyer, Martin, Hermann Ney, and Ralf Schlüter. "From feedforward to recurrent LSTM neural networks for language modeling." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.3 (2015): 517-529.
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
Jia, Xu, et al. "Guiding the long-short term memory model for image caption generation." Proceedings of the IEEE International Conference on Computer Vision. 2015.
Zhao, Z., Lu, H., Zheng, V. W., Cai, D., He, X., & Zhuang, Y. (2017). Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning. In AAAI (pp. 3532-3539).

To view or add a comment, sign in

Interactive Tutorial on LSTM (Long Short Term Memory Model)

Niraj Kumar, Ph.D.

Question-set-1

Question Set-2

Question Set-3

More articles by Niraj Kumar, Ph.D.

Others also viewed

The Amazing Journey of Transformer Architecture

Transformers: The Architecture That Redefined the Landscape of Artificial Intelligence

A Typical Convolutional Neural Network (CNN) Architecture

COVID Prediction Using Mask-R-CNN

CNN vs CAPSULE NETWORKS

Style Transfer and Moving Forward

Encoding Relationships between Words

Deep Dive into the Positional Encodings of the Transformer Neural Network Architecture: With Code!

CNN - RNN - LSTM - GRU - Basic Attention Mechanism

Visualizing the Impact of Feature Attribution Baselines

Explore content categories

Question-set-1

Question Set-2

Question Set-3

More articles by Niraj Kumar, Ph.D.

Part 2 | The Real Failure Mode of AI Coding Is Noisy Control

LLM Unlearning Is Not About Forgetting. It Is About Control or Governable Models.

SEO Is Not Dead—It Has Been Re-Architected by Generative Search.

From “Reasoning” to “Thinking”: Why Test-Time Compute Is the New Scaling Law

Why Hypergraph Multi-Agent Systems Are the Next Breakthrough in Agentic AI

Generative AI & LLMs: From First Principles to Agentic Intelligence

The Reasoning Stack: From Chain-of-Thought to Graph-of-Thought in LLM Systems and Beyond

🚨 45% of AI Code is Vulnerable — Stop Shipping Time Bombs

The Era of LLM Self-Optimization: Why We're Moving Beyond Manual Prompt Engineering?

This AI FIXES Its Own Mistakes?! Agentic LLMs & Self-Improving Prompts Explained Part-1

Others also viewed

The Amazing Journey of Transformer Architecture

Transformers: The Architecture That Redefined the Landscape of Artificial Intelligence

A Typical Convolutional Neural Network (CNN) Architecture

COVID Prediction Using Mask-R-CNN

CNN vs CAPSULE NETWORKS

Style Transfer and Moving Forward

Encoding Relationships between Words

Deep Dive into the Positional Encodings of the Transformer Neural Network Architecture: With Code!

CNN - RNN - LSTM - GRU - Basic Attention Mechanism

Visualizing the Impact of Feature Attribution Baselines

Explore content categories