Advances in Reasoning-Focused Large Language Models

Explore top LinkedIn content from expert professionals.

Summary

Advances in reasoning-focused large language models are transforming artificial intelligence by teaching AI not just to generate text, but to follow logic, make decisions, and solve complex problems more like humans. This evolution includes new architectures and training methods that allow models to explore multiple solutions, self-correct, and maintain coherence across longer interactions.

  • Encourage structured reasoning: Use models that can branch out, score their own logic, and revisit previous steps to improve accuracy and reliability in tasks like planning or technical documentation.
  • Pursue efficient exploration: Choose tools that explore multiple reasoning paths simultaneously, helping to reduce errors and reach solutions faster, especially in dynamic or challenging environments.
  • Integrate multimodal abilities: Adopt AI systems able to process not just text, but also other data types like speech, for more versatile and intuitive interactions across languages and domains.
Summarized by AI based on LinkedIn member posts
  • View profile for Brij kishore Pandey
    Brij kishore Pandey Brij kishore Pandey is an Influencer

    AI Architect & Engineer | AI Strategist

    720,739 followers

    For the last couple of years, Large Language Models (LLMs) have dominated AI, driving advancements in text generation, search, and automation. But 2025 marks a shift—one that moves beyond token-based predictions to a deeper, more structured understanding of language.  Meta’s Large Concept Models (LCMs), launched in December 2024, redefine AI’s ability to reason, generate, and interact by focusing on concepts rather than individual words.  Unlike LLMs, which rely on token-by-token generation, LCMs operate at a higher abstraction level, processing entire sentences and ideas as unified concepts. This shift enables AI to grasp deeper meaning, maintain coherence over longer contexts, and produce more structured outputs.  Attached is a fantastic graphic created by Manthan Patel How LCMs Work:  🔹 Conceptual Processing – Instead of breaking sentences into discrete words, LCMs encode entire ideas, allowing for higher-level reasoning and contextual depth.  🔹 SONAR Embeddings – A breakthrough in representation learning, SONAR embeddings capture the essence of a sentence rather than just its words, making AI more context-aware and language-agnostic.  🔹 Diffusion Techniques – Borrowing from the success of generative diffusion models, LCMs stabilize text generation, reducing hallucinations and improving reliability.  🔹 Quantization Methods – By refining how AI processes variations in input, LCMs improve robustness and minimize errors from small perturbations in phrasing.  🔹 Multimodal Integration – Unlike traditional LLMs that primarily process text, LCMs seamlessly integrate text, speech, and other data types, enabling more intuitive, cross-lingual AI interactions.  Why LCMs Are a Paradigm Shift:  ✔️ Deeper Understanding: LCMs go beyond word prediction to grasp the underlying intent and meaning behind a sentence.  ✔️ More Structured Outputs: Instead of just generating fluent text, LCMs organize thoughts logically, making them more useful for technical documentation, legal analysis, and complex reports.  ✔️ Improved Reasoning & Coherence: LLMs often lose track of long-range dependencies in text. LCMs, by processing entire ideas, maintain context better across long conversations and documents.  ✔️ Cross-Domain Applications: From research and enterprise AI to multilingual customer interactions, LCMs unlock new possibilities where traditional LLMs struggle.  LCMs vs. LLMs: The Key Differences  🔹 LLMs predict text at the token level, often leading to word-by-word optimizations rather than holistic comprehension.  🔹 LCMs process entire concepts, allowing for abstract reasoning and structured thought representation.  🔹 LLMs may struggle with context loss in long texts, while LCMs excel in maintaining coherence across extended interactions.  🔹 LCMs are more resistant to adversarial input variations, making them more reliable in critical applications like legal tech, enterprise AI, and scientific research.  

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    627,986 followers

    If you’re an AI engineer trying to understand how reasoning actually works inside LLMs, this will help you connect the dots. Most large language models can generate. But reasoning models can decide. Traditional LLMs followed a straight line: Input → Predict → Output. No self-checking, no branching, no exploration. Reasoning models introduced structure, a way for models to explore multiple paths, score their own reasoning, and refine their answers. We started with Chain-of-Thought (CoT) reasoning, then extended to Tree-of-Thought (ToT) for branching, and now to Graph-based reasoning, where models connect, merge, or revisit partial thoughts before concluding. This evolution changes how LLMs solve problems. Instead of guessing the next token, they learn to search the reasoning space- exploring alternatives, evaluating confidence, and adapting dynamically. Different reasoning topologies serve different goals: • Chains for simple sequential reasoning • Trees for exploring multiple hypotheses • Graphs for revising and merging partial solutions Modern architectures (like OpenAI’s o-series reasoning models, Anthropic’s Claude reasoning stack, DeepSeek R series and DeepMind’s AlphaReasoning experiments) use this idea under the hood. They don’t just generate answers, they navigate reasoning trajectories, using adaptive depth-first or breadth-first exploration, depending on task uncertainty. Why this matters? • It reduces hallucinations by verifying intermediate steps • It improves interpretability since we can visualize reasoning paths • It boosts reliability for complex tasks like planning, coding, or tool orchestration The next phase of LLM development won’t be about more parameters, it’ll be about better reasoning architectures: topologies that can branch, score, and self-correct. I’ll be doing a deep dive on reasoning models soon on my Substack- exploring architectures, training approaches, and practical applications for engineers. If you haven’t subscribed yet, make sure you do: https://lnkd.in/dpBNr6Jg ♻️ Share this with your network 🔔 Follow along for more data science & AI insights

  • View profile for Ross Dawson
    Ross Dawson Ross Dawson is an Influencer

    Futurist | Board advisor | Global keynote speaker | Founder: AHT Group - Informivity - Bondi Innovation | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice

    35,725 followers

    Chain-of-Thought has been a fundamental architecture driving LLM performance. Now 'Chain of Continuous Thought' (Coconut) significantly improves reasoning performance through working in latent space rather than language space. This paper from Meta's AI research group lays out the logic and results: 💡 Continuous Reasoning Unlocks Efficiency: Large Language Models (LLMs) traditionally reason in "language space," where reasoning steps are expressed as explicit tokens, leading to inefficiencies. The Coconut (Chain of Continuous Thought) paradigm instead reasons in a continuous latent space by feeding the model’s hidden state back as input. This reduces reliance on explicit tokens and improves reasoning efficiency, especially for complex tasks requiring backtracking. 📊 Higher Accuracy in Complex Reasoning Tasks: Coconut achieves significant accuracy improvements on complex tasks requiring planning and logic. In ProsQA, a reasoning-intensive task, Coconut attains 97.0% accuracy, far exceeding Chain-of-Thought (CoT) at 77.5%. Similarly, in logical reasoning tasks like ProntoQA, it achieves near-perfect performance at 99.8% accuracy, outperforming or matching other baselines while demonstrating superior planning capabilities. ⚡ Greater Efficiency with Fewer Tokens: Coconut enhances reasoning efficiency by reducing the number of generated tokens while maintaining accuracy. For example, in GSM8k (math reasoning), Coconut achieves 34.1% accuracy using just 8.2 tokens, compared to CoT's 42.9% accuracy which requires 25 tokens. This token efficiency indicates that reasoning in latent space allows the model to process fewer explicit steps without sacrificing performance. 🌟 Parallel Reasoning Explores Multiple Alternative Steps: Coconut enables LLMs to simultaneously explore multiple reasoning paths by encoding alternative next steps in the continuous latent space. This parallel reasoning behavior mimics breadth-first search (BFS), allowing the model to avoid premature decisions and progressively narrow down the correct solution. 🔄 Multi-Stage Training Accelerates Learning: Coconut leverages a curriculum-based training strategy, where the reasoning chain is gradually replaced with latent thoughts. This phased approach facilitates model learning, improving performance on math problems (GSM8k) and logical tasks, outperforming baselines like No-CoT and iCoT. 🔍 Latent Reasoning Improves Planning and Focus: By reasoning in latent space, the model avoids premature decisions and progressively narrows down possibilities. Coconut shows reduced hallucinations and improved accuracy compared to CoT, demonstrating its ability to prioritize promising reasoning paths while pruning irrelevant ones. New model architectures are consistently improving LLM performance and efficiency. Even without more training data and underlying model progress we are seeing consistent advances. Link to paper in comments.

  • View profile for Smriti Mishra
    Smriti Mishra Smriti Mishra is an Influencer

    Data & AI | LinkedIn Top Voice Tech & Innovation | Mentor @ Google for Startups | 30 Under 30 STEM

    88,531 followers

    How much do language models actually think? A recent paper from Apple, 'The Illusion of Thinking', explores this question by probing the limits of Large Reasoning Models (LRMs) such as Claude 3.7 Sonnet Thinking and DeepSeek-R1. These models aim to improve reasoning by generating long Chain-of-Thought (CoT) traces before producing an answer. Instead of relying on traditional math benchmarks, the authors designed controlled puzzle environments (like Tower of Hanoi and River Crossing), that allow them to systematically vary problem complexity and analyze model behavior step by step. Key takeaways from the paper: 🔹Three performance regimes: → At low complexity: non-thinking models often outperform LRMs in both accuracy and token efficiency. → At medium complexity: LRMs show benefits thanks to more elaborate reasoning traces. → At high complexity: both model types collapse (accuracy drops to zero). 🔹As problems grow more complex, models actually use fewer thinking tokens, despite having sufficient budget which highlights a possible inference-time scaling limitation. 🔹On simple tasks, models often reach the correct solution early but then continue generating incorrect or redundant reasoning. 🔹Even when the correct algorithm is provided in the prompt, models still fail at execution as complexity increases. The authors raise an important question: are today's LRMs truly engaging in reasoning or just producing more elaborate pattern completions? You can read the paper here: https://lnkd.in/dn3GTT66 The image used in the post is taken from the same paper. Curious to hear your take, especially if you work on reasoning, interpretability, or evaluation design. #technology #generativeai #artificialintelligence #llms #innovation

  • View profile for Himanshu J.

    Building Aligned, Safe and Secure AI

    29,457 followers

    🚀 Exploring the transition from LLMs to LRMs: Unveiling the evolution of "Thinking" in AI 🤖🧠 The shift from Large Language Models (LLMs) to Large Reasoning Models (LRMs) marks a significant transformation in how AI tackles intricate problem-solving tasks. 📚 A recent collaborative study by researchers from Massachusetts Institute of Technology, Cornell University, University of Washington, and Microsoft Research delves into a fundamental query:- 🔍 How can AI be trained to engage in "thinking" rather than merely generating text? 💡 The innovative approach, Reinforcement Learning via Self-Play (RLSP), introduces a novel method of instructing AI to engage in reasoning by integrating:- ✅ Supervised Fine-Tuning (SFT) – Learning from human or synthetic demonstrations of reasoning. ✅ Exploration Reward Signals – Promoting diverse reasoning avenues such as backtracking, verification, and the consideration of multiple hypotheses. ✅ Reinforcement Learning (RL) with Outcome Verification – Ensuring accurate reasoning without exploiting rewards. 🔥 Key Revelations & Advancements:- 📌 Emergent Behaviors: Models trained with RLSP showcased traits like self-correction, exploration, and verification, mirroring human problem-solving approaches. 📌 Performance Enhancement: RLSP led to a 23% increase in math problem-solving accuracy on Llama-3.1-8B and a 10% boost on AIME 2024 for Qwen2.5-32B. 📌 AI as a Search Mechanism: Thinking essentially involves a guided exploration of potential solutions, a concept resonating in methodologies like AlphaZero and Process Reward Modeling. 🌎 Significance of the Progress:- As AI systems transcend mere memorization to engage in active reasoning, the implications extend across scientific exploration, enterprise AI applications, and self-directed decision-making. Could this signify the dawn of AI cultivating its innate intuition? 🤔 📖 Explore the complete paper here - https://lnkd.in/dhr_C4-e Would love to hear your thoughts—where do you see AI reasoning making the biggest impact? 🚀👇 #AI #MachineLearning #LLMs #AIReasoning #ReinforcementLearning #LLMsToLRMs

  • View profile for Kuldeep Singh Sidhu

    Senior Data Scientist @ Walmart | BITS Pilani

    16,023 followers

    Excited to share groundbreaking research on DeepRAG - a novel framework that revolutionizes how Large Language Models (LLMs) interact with external knowledge. >> Key Innovation DeepRAG models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic and adaptive retrieval by decomposing complex queries into atomic subqueries. The framework introduces two game-changing components: Retrieval Narrative: Ensures structured retrieval flow by generating subqueries informed by previously retrieved information. Atomic Decisions: Dynamically determines whether to retrieve external knowledge or rely on parametric knowledge for each subquery. >> Technical Implementation The system employs a sophisticated binary tree search method to explore atomic decisions' impact on reasoning outcomes. It synthesizes training data through imitation learning, capturing the "subquery generation - atomic decision - intermediate answer" pattern. >> Performance Highlights - 21.99% improvement in answer accuracy while optimizing retrieval efficiency - Superior performance across multiple QA datasets including HotpotQA, 2WikiMultihopQA, PopQA, and WebQuestions - Demonstrates remarkable capability in time-sensitive QA tasks This breakthrough comes from researchers at Chinese Academy of Sciences and Tencent, marking a significant advancement in making LLMs more efficient and accurate in knowledge retrieval.

  • View profile for Elvis S.

    Founder at DAIR.AI | Angel Investor | Advisor | Prev: Meta AI, Galactica LLM, Elastic, Ph.D. | Serving 7M+ learners around the world

    85,577 followers

    NEW research from FAIR at Meta, Cornell, and CMU. This paper is a bigger deal than it seems. Apparently, you don't need billions of parameters to teach an AI model to reason. The default approach to post-training language models for reasoning today remains finetuning millions or even billions of parameters. But what if the signal needed for reasoning is far sparser than we assume? This new research introduces TinyLoRA, a method that scales low-rank adapters down to as few as a single trainable parameter. Using TinyLoRA with RL, they trained Qwen2.5-7B to 91% accuracy on GSM8K with only 13 parameters in bf16. That's 26 total bytes. So what's the idea? RL and SFT require fundamentally different amounts of model capacity. SFT must absorb the full demonstration, encoding both task-relevant structure and irrelevant noise into the update. RL receives a sparser, cleaner signal. The reward separates what matters from what doesn't, so resampling amplifies useful information while noise cancels out. Here are the results: On GSM8K, models trained with GRPO reach 90% accuracy with fewer than 100 parameters. Models of the same capacity trained with SFT barely outperform the base model. On harder benchmarks like MATH500, AIME, and AMC, finetuning just 196 parameters retains 87% of the absolute performance improvement averaged across six benchmarks. The trend scales with model size, too. Larger models need proportionally smaller updates, suggesting trillion-scale models may be trainable for many tasks with just a handful of parameters. The key takeaway is that reasoning may already live inside pretrained models. RL doesn't inject new knowledge; it surfaces what's already there, and it can do so with almost no parameter change at all.

  • Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks, but we found their fundamental limitations are more severe than expected. In our latest work, we compared each “thinking” LRM with its “non-thinking” LLM twin. Unlike most prior works that only measure the final performance, we analyzed their actual reasoning traces—looking inside their long "thoughts". Our study reveals several interesting results: 1 - Three distinct performance regimes🟡🔵🔴: Under equal inference-compute budgets, standard LLMs outperform reasoning models on low-complexity tasks, reasoning models excel at medium complexity, and both collapse to zero accuracy on high-complexity tasks. 2 - Counterintuitive scaling limits 🔄: As problems get more difficult, reasoning models initially think more (good!) but then START THINKING LESS despite having plenty of token budget left. They give up right when they should work harder! 3 - Looking inside the "thoughts" 🔍: By replaying every intermediate move in our simulators, our puzzle setup shows that LRM find answers early but then "overthink" on simple problems, eventually reach correct solutions only after exploring wrong paths on medium problems, and completely fail on hard problems. 4 - Catastrophic failure on exact computation ⚠️: Even when given explicit solution algorithms, reasoning models still collapse at the same complexity thresholds—revealing fundamental symbolic manipulation limits and erratic performance across tasks, as shown by Claude 3.7 flawlessly handling ~100 Tower of Hanoi moves yet floundering after just four steps in the River Crossing puzzle. 5 - Scaling compute is helpful, but not enough to close the reasoning gaps 🧠: Our findings challenge assumptions about LRM capabilities. Despite sophisticated self-reflection mechanisms from RL training, our results suggest that these models can't follow algorithm steps and importantly can't generalize algorithmic reasoning beyond certain complexity thresholds. #Paper: https://lnkd.in/g3XJC-cX Work done with my colleagues at Apple: Parshin Shojaee, keivan alizadeh vahid, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar #AI #MachineLearning #LLM #reasoning

    • +1
  • View profile for Shivani Virdi

    AI Engineering | Founder @ NeoSage | ex-Microsoft • AWS • Adobe | Teaching 70K+ How to Build Production-Grade GenAI Systems

    85,032 followers

    Chain-of-thought ≠ Reasoning. Hierarchical Reasoning Model (HRM) might be the next frontier in reasoning models Large Language Models (LLMs) are powerful, but when it comes to 𝘳𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨, they hit a wall. That’s because of a fundamental architectural limitation: Transformers are "Paradoxically shallow" . Even with billions of parameters, LLMs rely on 𝘊𝘩𝘢𝘪𝘯-𝘰𝘧-𝘛𝘩𝘰𝘶𝘨𝘩𝘵 (𝘊𝘰𝘛), breaking problems into step-by-step text. This works, but it’s fragile: • A single mistake in decomposition derails the reasoning. • It needs massive training data. • It’s slow, since every reasoning step = more tokens. Enter the 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 (𝗛𝗥𝗠), a new architecture inspired by how the brain handles reasoning across different timescales. 🔹 𝗛𝗼𝘄 𝗛𝗥𝗠 𝘄𝗼𝗿𝗸𝘀 Instead of forcing shallow token-by-token reasoning, HRM introduces 𝘵𝘸𝘰 𝘤𝘰𝘶𝘱𝘭𝘦𝘥 𝘳𝘦𝘤𝘶𝘳𝘳𝘦𝘯𝘵 𝘮𝘰𝘥𝘶𝘭𝘦𝘴: • 𝗛𝗶𝗴𝗵-𝗹𝗲𝘃𝗲𝗹 𝗺𝗼𝗱𝘂𝗹𝗲 (𝗛): Thinks 𝘴𝘭𝘰𝘸𝘭𝘺, plans abstractly. • 𝗟𝗼𝘄-𝗹𝗲𝘃𝗲𝗹 𝗺𝗼𝗱𝘂𝗹𝗲 (𝗟): Thinks 𝘧𝘢𝘴𝘵, executes detailed computations. The two work in cycles: • L repeatedly refines computations within a local step. • H updates only after L stabilizes, guiding the next phase. • Together, this creates 𝗵𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗰𝗼𝗻𝘃𝗲𝗿𝗴𝗲𝗻𝗰𝗲 → deep, multi-stage reasoning within a 𝘴𝘪𝘯𝘨𝘭𝘦 𝘧𝘰𝘳𝘸𝘢𝘳𝘥 𝘱𝘢𝘴𝘴. 🔹 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮𝗹 𝗯𝗿𝗲𝗮𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝘀 • 𝗡𝗼 𝗕𝗮𝗰𝗸𝗽𝗿𝗼𝗽𝗮𝗴𝗮𝘁𝗶𝗼𝗻 𝗧𝗵𝗿𝗼𝘂𝗴𝗵 𝗧𝗶𝗺𝗲 (𝗕𝗣𝗧𝗧): HRM uses a one-step gradient approximation, making training efficient (O(1) memory). • 𝗔𝗱𝗮𝗽𝘁𝗶𝘃𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻: Inspired by “thinking fast and slow,” HRM can decide when to stop reasoning vs continue refining. • 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴: Give it more compute at inference, and it naturally gets better at deeper reasoning, without retraining. 🔹 𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 With just 𝟮𝟳𝗠 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 and 𝟭𝗞 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀, HRM solved tasks where even advanced LLMs with CoT failed: • 𝗦𝘂𝗱𝗼𝗸𝘂-𝗘𝘅𝘁𝗿𝗲𝗺𝗲 (𝟵𝘅𝟵) → near-perfect accuracy (LLMs: 0%). • 𝗠𝗮𝘇𝗲-𝗛𝗮𝗿𝗱 (𝟯𝟬𝘅𝟯𝟬) → optimal pathfinding from scratch. • 𝗔𝗥𝗖-𝗔𝗚𝗜 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 → 40.3% accuracy, beating Claude 3.7 (21.2%) and o3-mini-high (34.5%). LLMs simulate reasoning with text. HRMs 𝘱𝘦𝘳𝘧𝘰𝘳𝘮 𝘳𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨 𝘪𝘯 𝘭𝘢𝘵𝘦𝘯𝘵 𝘴𝘱𝘢𝘤𝘦. Read the full paper: https://lnkd.in/gR72MDrw Over to you: Do you think HRMs will see as wide adoption as today's SOTA LLMs? ♻️ Reposting this helps everyone in your network keep up with the breakthroughs

  • View profile for Andre Saraiva

    AI Researcher at OpenAI, ex-DeepMind

    10,959 followers

    I’m excited to share some of the work we’ve been doing at OpenAI on applying large reasoning models to competitive programming. When we first started testing large language models on platforms like Codeforces, they struggled even with the basics. The big turning point was training them not just to predict text, but to reason – using reinforcement learning to encourage coherent chains of thought. That shift took us from roughly the 11th to the 89th percentile on unseen Codeforces contests. We then pushed further. By specializing one of these models (o1) for coding — a bit of extra RL training and some hand‑coded test‑time tactics — we created “o1‑ioi.” Under official International Olympiad in Informatics (IOI) constraints (50 submissions per problem, fixed time limits), o1‑ioi finished around the 49th percentile. Given more submissions, it even earned a gold medal. The next generation, o3, took an even more exciting step. Without any hand‑engineered strategies at test time, o3 achieved IOI gold under the same official constraints. Inspecting its reasoning, we found it had invented its own sensible tactics — for example, writing a simple brute‑force solution to check the correctness of a more optimized approach. These improvements carry over to new Codeforces contests: o3 now ranks in the 99.8th percentile — roughly #175 globally — on uncontaminated competitions. And while competitive programming is just one facet of coding, these advances hint at what reinforcement‑learned reasoning could bring to broader software engineering tasks. If you’d like to dive into the details, check out the full report I co‑authored with my colleagues at the reasoning team: https://lnkd.in/dZbcrwfn

Explore categories