Improving Accuracy With Self-Critique Tuning in LLMs

Explore top LinkedIn content from expert professionals.

Summary

Improving accuracy with self-critique tuning in large language models (LLMs) means teaching these AI models to review and revise their own answers through multiple attempts, much like how people learn from their mistakes. This process, often called recursive introspection or self-critique, helps LLMs achieve more reliable, accurate results especially for complex reasoning tasks.

  • Encourage iterative revision: Train AI models to refine their answers over several steps, allowing them to catch and correct errors along the way.
  • Utilize feedback mechanisms: Incorporate reward-based or feedback-driven systems so the model learns from both successes and mistakes during training.
  • Prioritize multi-turn approaches: Shift from single-shot prompts to multi-turn processes, enabling models to revisit previous responses and improve them with each pass.
Summarized by AI based on LinkedIn member posts
  • View profile for Aishwarya Naresh Reganti

    Founder & CEO @ LevelUp Labs | Ex-AWS | Consulting, Training & Investing in AI

    123,782 followers

    🤔 What if, instead of using prompts, you could fine-tune LLMs to incorporate self-feedback and improvement mechanisms more effectively? Self-feedback and improvement have been shown to be highly beneficial for LLMs and agents, allowing them to reflect on their behavior and reasoning and correct their mistakes as more computational resources or interactions become available. The authors mention that frequently used test-time methods like prompt tuning and few-shot learning that are used for self-improvement, often fail to enable models to correct their mistakes in complex reasoning tasks. ⛳ The paper introduces RISE: Recursive Introspection, an approach to improve LLMs by teaching them how to introspect and improve their responses iteratively. ⛳ RISE leverages principles from online imitation learning and reinforcement learning to develop a self-improvement mechanism within LLMs. By treating each prompt as part of a multi-turn Markov decision process (MDP), RISE allows models to learn from their previous attempts and refine their answers over multiple turns, ultimately improving their problem-solving capabilities. ⛳It models the fine-tuning process as a multi-turn Markov decision process, where the initial state is the prompt, and subsequent states involve recursive improvements. ⛳It employs a reward-weighted regression (RWR) objective to learn from both high- and low-quality rollouts, enabling models to improve over turns. The approach uses data generated by the learner itself or more capable models to supervise improvements iteratively. RISE significantly improves the performance of LLMs like LLaMa2, LLaMa3, and Mistral on math reasoning tasks, outperforming single-turn strategies with the same computational resources. Link: https://lnkd.in/e2JDQr8M

  • View profile for Sohrab Rahimi

    Director, AI/ML Lead @ Google

    23,608 followers

    Despite the impressive capabilities of LLMs, developers still face challenges in getting the most out of these systems. LLMs often need a lot of fine-tuning and prompt adjustments to produce the best results. First, LLMs currently lack the ability to refine and improve their own responses autonomously and second, they have limited research capabilities. It would be highly beneficial if LLMs could conduct their own research, equipped with a powerful search engine to access and integrate a broader range of resources. In the past couple of weeks, several studies have taken on these challenges: 1. Recursive Introspection (RISE): RISE introduces a novel fine-tuning approach where LLMs are trained to introspect and correct their responses iteratively. By framing the process as a multi-turn Markov decision process (MDP) and employing strategies from online imitation learning and reinforcement learning, RISE has shown significant performance improvements in models like LLaMa2 and Mistral. RISE enhanced LLaMa3-8B's performance by 8.2% and Mistral-7B's by 6.6% on specific reasoning tasks. 2. Self-Reasoning Framework: This framework enhances the reliability and traceability of RALMs by introducing a three-stage self-reasoning process, encompassing relevance-aware processing, evidence-aware selective processing, and trajectory analysis. Evaluations across multiple datasets demonstrated that this framework outperforms existing state-of-the-art models, achieving an 83.9% accuracy on the FEVER fact verification dataset, improving the model's ability to evaluate the necessity of external knowledge augmentation. 3. Meta-Rewarding with LLM-as-a-Meta-Judge: The Meta-Rewarding approach incorporates a meta-judge role into the LLM’s self-rewarding mechanism, allowing the model to critique its judgments as well as evaluate its responses. This self-supervised approach mitigates rapid saturation in self-improvement processes, as evidenced by an 8.5% improvement in the length-controlled win rate for models like LLaMa2-7B over multiple iterations, surpassing traditional self-rewarding methods. 4. Multi-Agent Framework for Complex Queries: It mimics human cognitive processes by decomposing complex queries into sub-tasks using dynamic graph construction. It employs multiple agents—WebPlanner and WebSearcher—that work in parallel to retrieve and integrate information from large-scale web sources. This approach led to significant improvements in response quality when compared to existing solutions like ChatGPT-Web and Perplexity.ai. The combination of these four studies would create a highly powerful system: It would self-improve through recursive introspection, continuously refining its responses, accurately assess its performance and learn from evaluations to prevent saturation, and efficiently acquire additional information as needed through dynamic and strategic search planning. How do you think a system with these capabilities reshape the future?

  • View profile for Sachin Kumar

    Senior Data Scientist III at LexisNexis | Experienced Agentic AI and Generative AI Expert

    8,693 followers

    Recursive Introspection: LLM finetuning approach to teach models how to self-improve Usually LLM does not exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, authors present RISE: Recursive IntroSpEction, an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optionally additional environment feedback. 𝗥𝗜𝗦𝗘 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄 i) Problem formulation - convert single-turn problems into multi-turn Markov decision processes(MDPs) - The state is given by the prompt, history of prior attempts, and optional feedback from the environment. - An action is a response generated from the LLM given the state of multi-turn interaction so far ii) Data collection - collect data by unrolling the current model 𝑘 − 1 times followed by an improved version of the response, which is obtained by either (1) self-distillation: sample multiple responses from the current model, and use the best response, or (2) distillation: obtain oracle responses by querying a more capable model. In either case, RISE then trains on the generated data. 𝗥𝗜𝗦𝗘 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗮𝘁 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝗶𝗺𝗲 i) with oracle - each time the model improves its response, it is allowed to check its answer against an environment and terminate early as soon as a correct answer is found ii) without oracle - ask the model to sequentially revise its own responses j times, and perform majority voting on all candidate outputs from different turns to obtain the final response - If turn number 𝑗 is larger than the iteration number 𝑘, the agent only keeps the most recent history with 𝑘 interactions to avoid test-time distribution shift. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 i) Metrics used: - with oracle, “p1@t5”: this run terminates the rollout as soon as response is correct. - without oracle, “m1@t5”: this run does not terminate rollout before five turns, and we compute maj@1 performance on candidates produced in each turn ii) Results: - RISE attains the biggest performance improvement between 1-turn (m5@t1) and 5-turn (m1@t5) performance w/o an oracle on both GSM8K and MATH - prompting-only self-refine largely degrades performance across the board - Using RISE on top of Mistral-7B exceeds even state-of-the-art math models such as Eurus-7B-SFT 𝗟𝗶𝗺𝗶𝘁𝗮𝘁𝗶𝗼𝗻𝘀 - Improving with self-generated supervision will likely require more computation and more iterations, since it will be slower than when using an off-the-shelf expert model - RISE requires running manual iterations and hence, a more “online” variant of RISE is likely solution in the long run 𝗕𝗹𝗼𝗴: https://lnkd.in/eAcCi99S 𝗣𝗮𝗽𝗲𝗿: https://lnkd.in/eP8VwHrz

  • View profile for Giovanni Sisinna

    Program Director | PMO & Portfolio Governance | AI & Digital Transformation

    6,686 followers

    How Can LLMs and Agents Self-Improve? Exploring Recursive Introspection! In the ever-evolving landscape of AI, a new challenge emerges: Can Large Language Models (LLMs) and agents learn to self-improve? The recent study on "Recursive Introspection" delves into this, exploring how these models can iteratively refine their responses, even when initially incorrect. Let's dive into the fascinating findings and implications! 🔹 Research Focus The study introduces RISE, an innovative approach to enable LLMs to introspect and enhance their outputs. Unlike traditional models that rely on single-shot responses, RISE uses a multi-turn Markov Decision Process (MDP) to train LLMs to recognize and correct their mistakes over several iterations. 🔹 Recursive Introspection   RISE allows LLMs to refine their answers through feedback and multiple attempts, similar to how humans learn from mistakes. This self-improvement capability is particularly effective in complex tasks like math problem-solving, where accuracy often requires iterative corrections. 🔹 Data Collection and Training   The training process involves collecting data from the model’s own outputs and refining them using a reward-based system. This approach leverages both successful and unsuccessful attempts, ensuring the model learns from a diverse set of scenarios. 🔹 Performance Improvements   The study shows significant improvements in models like Llama2 and Mistral when using RISE. The models not only become better at initial responses but also show enhanced ability to correct mistakes over multiple turns, outperforming baseline models and even proprietary ones like GPT. 📌 Significance and Future Implications This research marks a critical advancement in AI, demonstrating that LLMs can achieve self-improvement. This capability is pivotal for developing autonomous systems capable of learning and adapting in real time. The next steps could involve integrating this approach into broader AI applications, enhancing decision-making, and problem-solving capabilities across various industries. 👉 How do you think this self-improvement capability will transform AI applications in your field? Share your thoughts or questions below! Let's discuss how these advancements can be leveraged for innovation. 👈 #LLM #LLMs #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NLP #DataScience #AIethics #FutureOfWork #TechInnovation #TechTrends #Innovation #TechNews

  • View profile for Terezija Semenski, MSc

    Helping 300,000+ people master AI and Math fundamentals faster | LinkedIn [in]structor 15 courses | Author @ Math Mindset newsletter

    31,115 followers

    MIT just released a paper about recursive language models (RLMs) that shows why most LLM "failures" aren't what you think. And the results are surprising. Most LLM failures are not knowledge failures. They're 1st-draft failures. The paper measured what happens when a model is allowed to revise its own output multiple times instead of scaling parameters The result: accuracy jumps by 10–25% on multi-step reasoning tasks after just a few recursive passes. On multi-step reasoning tasks, when researchers added just 2–4 recursive passes, the correctness improved by 10–25%. On longer planning problems, error rates drop even more sharply, because early logical mistakes get corrected in later passes instead of being propagated forward. The authors compared:  -a larger non-recursive model  -a smaller recursive model using multiple passes A smaller recursive model reached comparable or even better accuracy while using fewer parameters and fewer total tokens in the final answer. The takeaway is:  -hallucination reduction -better reasoning comes from iterative self-correction, not from dumping more intermediate tokens -get more reasoning per unit of compute by looping than by scaling Paper link in comments ♻️ Repost to help someone learn RLMs 

Explore categories