Why Tiny Recursive Models Beat Massive LLMs?

Buddhi Kavindra Ranasinghe

Published Oct 13, 2025

Instead of building massive Large Language Models (LLMs) that try to be right in one pass, researchers have developed a model that learns to repeatedly improve its guesses, much like a human rethinking their answer multiple times. This simple, recursive process achieves remarkable results, proving that "less is more" when reasoning is done iteratively rather than in one huge burst.

Size Isn't Everything

A new paper introduces the Tiny Recursive Model (TRM), a very small neural network that can solve complex reasoning problems, like Sudoku or mazes, better than many large LLMs. It only has about 7 million parameters, not the billions or trillions we hear about with models like GPT, Gemini or Claude.

The core idea is simple: repeating the reasoning process recursively is more effective than just scaling up the model's parameter count. The TRM repeatedly revises its answer until it's confident, a stark contrast to the massive, one-shot prediction process of typical LLMs.

Where LLMs Fall Short?

LLMs are incredible at language, but they have a few key weaknesses when it comes to pure logic:

Auto-Regressive Fragility: They generate one word (token) at a time. If the model makes a mistake early in a complex reasoning chain, the entire answer goes wrong.
Costly Workarounds: Methods like "Chain-of-Thought" (CoT) prompting help, but they are expensive to run and can be brittle, since the quality of the final answer depends on how well the reasoning text is generated.
Unsolved Puzzles: Even scaling model size or sampling more answers hasn't helped solve certain benchmarks. For example, the ARC-AGI benchmark, a set of challenging, abstract visual reasoning tasks, remains largely unsolved by the biggest models like GPT, Gemini and Claude out there.

This led the authors to a profound question: "Can a small model reason better if we let it repeatedly refine its answers, instead of trying to be big and right the first time?"

The Hierarchical Reasoning Model (HRM)

The TRM didn't come out of nowhere; it's a simplification of an earlier idea, the Hierarchical Reasoning Model (HRM).

HRM was complex. It tried to imitate the brain by using two small transformer networks: one running frequently for "low-level" thinking and one running less often for "high-level" reasoning. Its key concepts involved:

Dual Recursion: The two networks repeatedly exchanged information.
Deep Supervision: It was trained to gradually refine its output over several iterations, not just in one go.

While HRM worked pretty well hitting about 55% accuracy on a hard Sudoku task, it was overly complex, had messy math, and felt computationally clunky.

Tiny Recursive Model (TRM)

The Tiny Recursive Model (TRM) strips away the complexity of its predecessor.

Architecture and Process

The TRM uses just one small network with two layers and about 7 million parameters. It ditches the dual hierarchies, biological metaphors, and complex fixed-point math.

Instead, it recursively refines two simple things in a loop:

Key Findings

This is where the results truly shine. A 7-million-parameter model is competing and often winning against billion-parameter LLMs on structured reasoning tasks.

While the ARC-AGI benchmark remains tough, the fact that a tiny, recursive network can achieve scores up to 45% when huge LLMs like Gemini and Claude often score below 37% is the "wow" moment of the research.

Why Does This Work?

The success of TRM comes down to a few factors:

Simulated Depth: Each recursive pass is like adding another layer of reasoning. A 2-layer network run 20 times effectively emulates a 40-layer transformer. The recursion builds effective depth without increasing the model's footprint.
Better Generalization: Overfitting is a major problem when you only have a few thousand training examples, which is common for complex reasoning problems. Smaller models combined with recursive learning are much better at generalizing the core logic.
Improved Stability: The deep supervision teaches the model how to improve partial answers over time, drastically reducing the chances of a catastrophic early mistake.

Think about how you debug a tricky piece of code. You don't just write a massive, complex block once and hope it works.

You write an initial version (y0).
You think through what went wrong (z0).
You fix and improve your code (y1).
You re-analyze the new code for errors (z1).
You repeat this loop until the code runs perfectly.

That is exactly what TRM is doing. It's using the same small "mental model"—that single, 2-layer network—to recursively refine the answer until it solves the puzzle.

Conclusion

The Tiny Recursive Model strongly suggests that recursion can outperform sheer scale. You can simulate the depth and complexity of reasoning through iteration, not just size. I predict that this concept will rapidly be adopted for Edge AI and other small, deployable systems. This architecture proves it's possible to create highly efficient AI models that are specialists at complex reasoning, handling tasks that current general-purpose LLMs struggle with. It’s a powerful step toward a theoretical understanding of how neural nets can truly "think."

Buddhilive Research

791 followers

+ Subscribe

Mark Frazier 6mo

Excellent post! Are there any wrappers yet for interchange of recursive tiny reasoning models, that can include data on use contexts and gap-closing feedback? I’m working on lightweight frameworks for plugging in new reasoning models and their default database schemas.   The aim is to let creators of online courses dynamically swap-in/swap out tiny reasoning modules, much as SCORM provides an open framework for swapping of Reusable Learning Objects. And then to get feedback on how they fare in real-world test runs. Welcome any insights/links that may help.

Why Tiny Recursive Models Beat Massive LLMs?

Buddhi Kavindra Ranasinghe

Size Isn't Everything

Where LLMs Fall Short?

The Hierarchical Reasoning Model (HRM)

Tiny Recursive Model (TRM)

Architecture and Process

Recommended by LinkedIn

Training for Stability

Key Findings

Why Does This Work?

Conclusion

Buddhilive Research

791 followers

More articles by Buddhi Kavindra Ranasinghe

Others also viewed

SeqToSeq Models vs Attention Models: A point of view

How Large Language Models Work | A Simple Guide to the AI Technology Shaping

Unraveling the Structure of the GPT-3 Model: A Journey through Transformer Blocks

How does an LLM think for 'Grass is always...' similar to the way you think?

BelieveOrNot #16: Digital fragmentation is here.

The Geometry of Intelligence: Why AI Thinks in Space, Not Symbols

Maximizing the Performance of AI and Minimizing the Limitations through Gödel's Incompleteness Theorem

Why Large Language Models Don’t Always Give the Same Answer

Tokenisation under the Hood

Unlocking Potential: A Comprehensive Guide to Fine-Tuning Large Language Models🧠

How Llms Process Language

Key Challenges in LLM Interpretability Research

How Large Language Models Solve Problems Without Introspection

How Large Language Models Create Text Responses

LLM Performance in Text Completion vs Logical Reasoning

How Large Language Models Respond to Unexpected Prompts

Comparing Open-Source LLMs and Advanced Reasoning Models

Explore content categories

Size Isn't Everything

Where LLMs Fall Short?

The Hierarchical Reasoning Model (HRM)

Tiny Recursive Model (TRM)

Architecture and Process

Recommended by LinkedIn

Training for Stability

Key Findings

Why Does This Work?

Conclusion

Buddhilive Research

791 followers

More articles by Buddhi Kavindra Ranasinghe

Will Software Engineers be replaced?

Vectorless RAG

The research that might bring down RAM prices

The Architecture of Sound: From My Early Days with Magenta to Google’s New Lyria 3

Moltbook: The Reddit for AI Agents

Did MIT Researchers solve the Context Window Limit?

Google MedGemma 1.5: Giving AI Eyes and Ears in the Hospital

Is Vibe Coding Safe? A Reality Check for AI Development

Will Nested Learning be the Future of AI?

Vibe Hacking: The Dark Side of AI

Others also viewed

SeqToSeq Models vs Attention Models: A point of view

How Large Language Models Work | A Simple Guide to the AI Technology Shaping

Unraveling the Structure of the GPT-3 Model: A Journey through Transformer Blocks

How does an LLM think for 'Grass is always...' similar to the way you think?

BelieveOrNot #16: Digital fragmentation is here.

The Geometry of Intelligence: Why AI Thinks in Space, Not Symbols

Maximizing the Performance of AI and Minimizing the Limitations through Gödel's Incompleteness Theorem

Why Large Language Models Don’t Always Give the Same Answer

Tokenisation under the Hood

Unlocking Potential: A Comprehensive Guide to Fine-Tuning Large Language Models🧠

Similar topics

How Llms Process Language

Key Challenges in LLM Interpretability Research

How Large Language Models Solve Problems Without Introspection

How Large Language Models Create Text Responses

LLM Performance in Text Completion vs Logical Reasoning

How Large Language Models Respond to Unexpected Prompts

Comparing Open-Source LLMs and Advanced Reasoning Models

Explore content categories