Recurrent Neural Networks (RNNs)

Explore top LinkedIn content from expert professionals.

Summary

Recurrent Neural Networks (RNNs) are a type of artificial intelligence model designed to process sequences, such as text or speech, by remembering and updating information as new data comes in. Unlike other neural networks, RNNs use a feedback loop to keep track of context, allowing them to handle tasks that involve understanding time or order, like language translation or stock prediction.

  • Understand sequence memory: RNNs use a hidden state to store and build context at each step, letting them remember what has happened earlier in a sequence.
  • Explore advanced variants: LSTM and GRU architectures add gates to RNNs, helping them overcome challenges like forgetting long-term information and improving their ability to handle complex tasks.
  • Consider parallel processing: New innovations allow RNNs to scale and process sequences faster, making them a promising option for applications requiring quick and memory-efficient results.
Summarized by AI based on LinkedIn member posts
  • View profile for Shreya Saravanan

    Data & AI Engineer | Software Engineer @Amdocs | MS in Engineering Management | Northeastern University

    3,463 followers

    🌟 Day 35 of My 90-Day AI Learning Journey 🌟 𝗥𝗲𝗰𝘂𝗿𝗿𝗲𝗻𝘁 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 (𝗥𝗡𝗡𝘀): 𝗧𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 𝘁𝗼 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝗧𝗶𝗺𝗲 While CNNs excel at understanding spatial patterns (like images), 𝗥𝗡𝗡𝘀 𝗮𝗿𝗲 𝗯𝘂𝗶𝗹𝘁 𝗳𝗼𝗿 𝘀𝗲𝗾𝘂𝗲𝗻𝘁𝗶𝗮𝗹 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴- learning how information evolves over time. An RNN introduces recurrence: the output from one step becomes part of the input for the next. This feedback loop gives the model a form of memory, allowing it to capture temporal dependencies - crucial for language modeling, speech recognition, stock forecasting, etc. → 𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: 𝗩𝗮𝗻𝗶𝘀𝗵𝗶𝗻𝗴 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗙𝗼𝗿𝗴𝗲𝘁𝘁𝗶𝗻𝗴 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 In a 𝘃𝗮𝗻𝗶𝗹𝗹𝗮 𝗥𝗡𝗡, the output from each time step is fed into the next time step. When training, 𝗯𝗮𝗰𝗸𝗽𝗿𝗼𝗽𝗮𝗴𝗮𝘁𝗶𝗼𝗻 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝘁𝗶𝗺𝗲 (𝗕𝗣𝗧𝗧) is used to compute gradients for each weight based on all previous time steps. However, as we go back through many time steps, those gradients are 𝗿𝗲𝗽𝗲𝗮𝘁𝗲𝗱𝗹𝘆 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗶𝗲𝗱 𝗯𝘆 𝘀𝗺𝗮𝗹𝗹 𝗻𝘂𝗺𝗯𝗲𝗿𝘀 (< 𝟭). Over time, they 𝘀𝗵𝗿𝗶𝗻𝗸 𝗲𝘅𝗽𝗼𝗻𝗲𝗻𝘁𝗶𝗮𝗹𝗹𝘆 - a phenomenon called the 𝘃𝗮𝗻𝗶𝘀𝗵𝗶𝗻𝗴 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗽𝗿𝗼𝗯𝗹𝗲𝗺. As a result:  • Earlier inputs (e.g., words from 20–50 steps ago) have almost no influence on the current prediction.  • The network “forgets” long-term dependencies and only learns short-term patterns. → 𝗧𝗵𝗲 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: 𝗟𝗦𝗧𝗠 𝗮𝗻𝗱 𝗚𝗥𝗨 To overcome this, 𝗟𝗦𝗧𝗠𝘀 (𝗟𝗼𝗻𝗴 𝗦𝗵𝗼𝗿𝘁-𝗧𝗲𝗿𝗺 𝗠𝗲𝗺𝗼𝗿𝘆) and 𝗚𝗥𝗨𝘀 (𝗚𝗮𝘁𝗲𝗱 𝗥𝗲𝗰𝘂𝗿𝗿𝗲𝗻𝘁 𝗨𝗻𝗶𝘁𝘀) were designed with 𝗴𝗮𝘁𝗲𝘀 - special mechanisms that regulate information flow. They decide:  • 𝗪𝗵𝗮𝘁 𝘁𝗼 𝗸𝗲𝗲𝗽 (long-term memory)  • 𝗪𝗵𝗮𝘁 𝘁𝗼 𝘂𝗽𝗱𝗮𝘁𝗲 (new input)  • 𝗪𝗵𝗮𝘁 𝘁𝗼 𝗳𝗼𝗿𝗴𝗲𝘁 (irrelevant or outdated info) → 𝗛𝗲𝗿𝗲’𝘀 𝗵𝗼𝘄 𝘁𝗵𝗮𝘁 𝘁𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗲𝘀 𝘁𝗼 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀:  • 𝗖𝗵𝗮𝘁𝗯𝗼𝘁𝘀 & 𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗔𝗜: RNNs process sentences word by word, maintaining conversational context. For example, if you say “I’m looking for Italian restaurants,” the RNN remembers that “Italian” refers to cuisine, shaping its next response appropriately.  • 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻: In machine translation (like English → French), RNNs read one language sequentially and generate the corresponding translation step-by-step. Each output word depends on the meaning built up from all previous ones. Even though 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀 (like BERT and GPT) now dominate NLP, 𝗥𝗡𝗡𝘀 𝗿𝗲𝗺𝗮𝗶𝗻 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝗮𝗹 - they introduced the core idea of sequential memory and paved the way for modern architectures. #DeepLearning #RNN #ArtificialIntelligence #MachineLearning #AIInnovation #OpenToWork

  • View profile for Sovesh Mohapatra

    PhD Researcher @ UPenn | AI for Brain Imaging & Neuroscience | Deep Learning · NLP · Neuroimaging

    7,692 followers

    How do neural networks process sequences? They learned to remember. Before Transformers, Recurrent Neural Networks ruled sequence modeling. RNNs process data step-by-step, maintaining a hidden state that accumulates context over time. From language translation to speech recognition, RNNs powered a decade of AI breakthroughs. But how do they actually work? What does recurrence really mean? I'm launching a new 3-part "Build in Public" mini-series tracing RNNs from first principles. No black boxes. Just pure PyTorch and raw math. 1️⃣ Part 1 (Today): The Math of Recurrence — Hidden states, backpropagation through time (BPTT), and the vanishing gradient problem. 2️⃣ Part 2: Building the Architecture — Implementing RNN cells, multi-layer RNNs, Sequence Classifiers, and CharRNN for text generation. 3️⃣ Part 3: Training and Analyzing Dynamics — Visualizing how hidden states evolve over time to encode temporal information. If you followed my Transformer and CNN teardowns, you know we're going deep under the hood. Drop a "🔄" in the comments if you want to be tagged when the PyTorch code drops in Part 2! Full technical write-up for Part 1 on my blog: [Link to Blog in the comments] #DeepLearning #MachineLearning #ArtificialIntelligence #PyTorch #RNN #SequenceModeling #NLP #BuildInPublic

  • View profile for Federico Danieli

    AI/ML Research Scientist

    2,480 followers

    We taught LSTMs to run in parallel. Now they've grown to 7B parameters, and are ready to challenge Transformers. For years, we’ve assumed RNNs were doomed—inherently sequential, too slow to train, impossible to scale—and looked at Transformers as the go-to choice for Large Language Modelling. Turns out we just needed better math. Introducing 𝗣𝗮𝗿𝗮𝗥𝗡𝗡: 𝗨𝗻𝗹𝗼𝗰𝗸𝗶𝗻𝗴 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗼𝗳 𝗡𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 👉 [TL;DR] We can now train nonlinear RNNs at unprecedented scales, by parallelising what was previously considered inherently sequential—the unrolling of recurrent computations. If you care about fast inference for LLMs, or are into time-series analysis, we got good news for you: RNNs are back on the menu. 🐍 But wait, doesn’t Mamba parallelise this too? Sure, but here's the catch: Mamba requires state space updates to be linear, fundamentally affecting expressivity. We want the freedom to apply nonlinearities sequence-wise. 💡 Our approach: Recast the sequence of nonlinear recurrences as a system of equations, then solve them in parallel using Newton's method. As a bonus, make everything blazingly fast with custom CUDA kernels. ⚡ The result? Up to 665x speedup over naive sequential processing, and training times comparable to Mamba, even with the extra overhead from Newton’s iterations. 📈 So we took LSTM and GRU architectures—remember those from the pre-Transformer era?—scaled them to 7B parameters, and achieved perplexity comparable to similarly-sized Transformers. No architectural tricks. Just pure scale, finally unlocked. 🔥 Why this matters: Mamba challenged the Transformer’s monopoly. ParaRNN expands the search space of available architectures. It’s time to get back to the drawing board and use these tools to start designing the next generation of inference-efficient models. 💻 To aid with this, we’re releasing open-source code to parallelise RNN applications, out-of-the box. No need to bother implementing your own parallel scan, nor trying to remember how Newton works: just prescribe the recurrence relationship, flag eventual structures in your hidden state update, and watch GPUs go 𝘣𝘳𝘳𝘳𝘳𝘳𝘳𝘳. Paper: https://lnkd.in/dTEGh5Jp Code: https://lnkd.in/d_Ven9Y2 Collaborators: Pau Rodriguez Lopez, Miguel Sarabia, Xavier Suau, Luca Zappella --------------------------------------------- 💼 And if you're a PhD student interested in working on these topics, we got a fresh internship position just for you: https://lnkd.in/dDVSsfJj 𝗧𝗶𝗺𝗲 𝘁𝗼 𝗲𝘅𝗽𝗹𝗼𝗿𝗲 𝘄𝗵𝗮𝘁 𝘁𝗿𝘂𝗹𝘆 𝗻𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗰𝗮𝗻 𝗱𝗼 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲

  • View profile for Uday Kamath, Ph.D.

    Author (8 books on AI) | Keynote Speaker | AI Leader | Chief Analytics Officer (Smarsh) | Board Advisor | Published Researcher

    8,051 followers

    Before Transformers took over, Recurrent Neural Networks (RNNs) were how we did sequence modeling. They were fast, efficient, and elegant. But they compressed everything into a single hidden state that kept overwriting itself. Transformers won because their attention mechanism could directly look back at any token in the sequence. That ability to recall is why every LLM today is built on Transformers. But attention scales quadratically with context length, and that cost is becoming the bottleneck as we push toward longer contexts. A new paper from Google Research asks a simple question: what if RNNs just stopped throwing away their old states? Memory Caching splits the sequence into segments, saves a checkpoint of the memory at each boundary, and lets the model query all of them at retrieval time. Four variants, a clean complexity-recall tradeoff, and real gains on recall benchmarks without the quadratic bill. I wrote a deep dive on the paper and built a full companion notebook reproducing all four aggregation variants from scratch in PyTorch. The notebook also includes what I think is the most interesting experiment: training deep (nonlinear) memory modules, in which two of the paper's variants, GRM and Soup, produce measurably different architectures, confirming a key theoretical prediction that collapses away in the linear case. https://lnkd.in/erEEHpEV #LLM #AI #NLP #AIResearch #SequenceModeling #AttentionMechanism #RecurrentNeuralNetworks

  • View profile for HARIKARAN M

    Artificial intelligence (AI) - Machine Learning (ML) Researcher (Aspiring) For Healthcare & Computer Vision || Lead – Human Resource Recruitment || Farmer || Decoding Anatomy of Artificial intelligence (AI) Mechanism

    18,595 followers

    🚀 𝐑𝐍𝐍 𝐬𝐞𝐪𝐮𝐞𝐧𝐭𝐢𝐚𝐥 𝐯𝐞𝐜𝐭𝐨𝐫 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧𝐬 Recurrent Neural Networks (RNNs) are often wrapped in metaphors, but at their essence, they are simply 𝐬𝐞𝐪𝐮𝐞𝐧𝐭𝐢𝐚𝐥 𝐯𝐞𝐜𝐭𝐨𝐫 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧𝐬. They take variable-length input and compress it into a single, fixed-length embedding that represents the entire sequence. 𝟏. 𝐓𝐇𝐄 𝐑𝐄𝐂𝐔𝐑𝐑𝐄𝐍𝐂𝐄 𝐄𝐍𝐆𝐈𝐍𝐄 The core of an RNN is a process that updates a "hidden state" at every step. 𝐓𝐡𝐞 𝐇𝐢𝐝𝐝𝐞𝐧 𝐒𝐭𝐚𝐭𝐞: Think of this as a local variable that holds a partial result. It is not a learned parameter; it is a vector calculated at each step. 𝐓𝐡𝐞 𝐈𝐧𝐩𝐮𝐭𝐬: An RNN processes one symbol at a time (like one character in a word). 𝐓𝐡𝐞 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧: The model uses two specific matrices—one to transition the previous "memory" and another to integrate the new input. 𝐓𝐡𝐞 𝐒𝐮𝐦𝐦𝐚𝐫𝐲: The final hidden state becomes the summary of everything seen from the start to the end of the sequence. 𝟐. 𝐁𝐀𝐂𝐊𝐏𝐑𝐎𝐏𝐀𝐆𝐀𝐓𝐈𝐎𝐍 𝐓𝐇𝐑𝐎𝐔𝐆𝐇 𝐓𝐈𝐌𝐄 Learning in an RNN means finding the optimal values for the internal matrices. 𝐑𝐞𝐮𝐬𝐚𝐛𝐥𝐞 𝐌𝐚𝐭𝐫𝐢𝐜𝐞𝐬: The same weights are used for every symbol in the sequence. This is what allows the model to handle sequences of any length. 𝐓𝐡𝐞 𝐆𝐫𝐚𝐝𝐢𝐞𝐧𝐭 𝐅𝐥𝐨𝐰: Because the matrices are reused, the errors are summed across all steps to update the shared weights. 𝐓𝐫𝐮𝐧𝐜𝐚𝐭𝐞𝐝 𝐔𝐩𝐝𝐚𝐭𝐞𝐬: For massive sequences like long documents, we stop the error calculation after a certain number of steps to save memory, though the hidden state continues to process the full history. 𝟑. 𝐌𝐈𝐍𝐈𝐁𝐀𝐓𝐂𝐇𝐈𝐍𝐆 & 𝐏𝐀𝐃𝐃𝐈𝐍𝐆 To speed up training, we process multiple records at once in a batch. 𝐁𝐚𝐭𝐜𝐡 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠: Instead of a single hidden vector, we use a matrix where each row represents the state for a different record in the batch. 𝐋𝐞𝐟𝐭 𝐏𝐚𝐝𝐝𝐢𝐧𝐠: When records have different lengths (like words), we add zero-vectors to the 𝐥𝐞𝐟𝐭. 𝐖𝐡𝐲? If you add empty data at the end, the final "summary" vector is diluted by useless information. Left padding ensures the final result is based on the most relevant, real input. 𝟒. 𝐊𝐄𝐘 𝐃𝐈𝐒𝐓𝐈𝐍𝐂𝐓𝐈𝐎𝐍𝐒 𝐇𝐢𝐝𝐝𝐞𝐧 𝐒𝐭𝐚𝐭𝐞: A transient local variable; the "running total" of the sequence. 𝐋𝐞𝐚𝐫𝐧𝐞𝐝 𝐌𝐚𝐭𝐫𝐢𝐜𝐞𝐬: The core parameters; initialized once and updated during training. 𝐈𝐭𝐞𝐫𝐚𝐭𝐨𝐫: The variable used to step through the input symbols. 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠: The final vector produced after the last symbol is processed. 🔥 𝐓𝐇𝐄 𝐁𝐎𝐓𝐓𝐎𝐌 𝐋𝐈𝐍𝐄: An RNN is a function that maps a variable sequence into a fixed-dimensional space. It is a "memory machine" where the architecture is defined by linear transformations and the intelligence is stored in weights that learn how to update that memory over time. #𝐑𝐍𝐍 #𝐃𝐞𝐞𝐩𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 #𝐌𝐚𝐜𝐡𝐢𝐧𝐞𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 #𝐀𝐈

  • View profile for Kavishka Abeywardana

    Machine Learning & Signal Processing Researcher | Semantic Communication • Deep Learning • Optimization | AI Research Writer

    25,565 followers

    LSTM: The Return of the King 👑 For a long time, Long Short-Term Memory networks (LSTMs) felt like a chapter closed in deep learning history after Transformers took over. But with the recent introduction of xLSTM as a serious alternative to Transformers for large language models, it’s the perfect moment to revisit this classic architecture. LSTM was introduced as an upgrade over vanilla recurrent neural networks (RNNs), solving two major limitations: ⚠️ Vanishing gradients ⚠️ Weak ability to capture long-term dependencies You can think of an LSTM as a highway that carries information from the past to the future, but with smart control gates at every step to decide: ✅ What to keep ✅ What to update ✅ What to forget A simple analogy: imagine a long-distance bus journey. At every stop, some passengers get off (forgetting old information), new ones get on (adding new information), and the bus refuels (updating the memory). Yet the journey continues smoothly across many miles , just like how LSTMs maintain context over long sequences. With xLSTM now offering a modern rethinking of recurrent architectures, it’s exciting to see sequence models evolve once again. Maybe recurrence isn’t dead after all, it was just waiting for a smarter design.

Explore categories