Pathways Approach to Large Language Model Training

Explore top LinkedIn content from expert professionals.

Summary

The pathways approach to large language model training is a structured multi-stage process that guides AI models from learning basic language patterns to mastering complex reasoning and aligning with human values. This method breaks down training into clear steps, making models more reliable and adaptable for different tasks.

Build foundational skills: Start by teaching the model to recognize and generate language using vast text datasets, laying the groundwork for future learning.
Customize and specialize: Adapt the model to follow instructions and tackle domain-specific tasks through targeted fine-tuning and the injection of specialized knowledge.
Align with human intent: Use reinforcement learning and human feedback to ensure the model responds safely, helpfully, and thoughtfully in real-world scenarios.

Summarized by AI based on LinkedIn member posts

Elvis S.

Founder at DAIR.AI | Angel Investor | Advisor | Prev: Meta AI, Galactica LLM, Elastic, Ph.D. | Serving 7M+ learners around the world

85,586 followers 1y
Report this post
Large Language Diffusion Models (LLaDA) Proposes a diffusion-based approach that can match or beat leading autoregressive LLMs in many tasks. If true, this could open a new path for large-scale language modeling beyond autoregression. More on the paper: Questioning autoregressive dominance While almost all large language models (LLMs) use the next-token prediction paradigm, the authors propose that key capabilities (scalability, in-context learning, instruction-following) actually derive from general generative principles rather than strictly from autoregressive modeling. Masked diffusion + Transformers LLaDA is built on a masked diffusion framework that learns by progressively masking tokens and training a Transformer to recover the original text. This yields a non-autoregressive generative model—potentially addressing left-to-right constraints in standard LLMs. Strong scalability Trained on 2.3T tokens (8B parameters), LLaDA performs competitively with top LLaMA-based LLMs across math (GSM8K, MATH), code (HumanEval), and general benchmarks (MMLU). It demonstrates that the diffusion paradigm scales similarly well to autoregressive baselines. Breaks the “reversal curse” LLaDA shows balanced forward/backward reasoning, outperforming GPT-4 and other AR models on reversal tasks (e.g. reversing a poem line). Because diffusion does not enforce left-to-right generation, it is robust at backward completions. Multi-turn dialogue and instruction-following After supervised fine-tuning, LLaDA can carry on multi-turn conversations. It exhibits strong instruction adherence and fluency similar to chat-based AR LLMs—further evidence that advanced LLM traits do not necessarily rely on autoregression. https://lnkd.in/eYp9Hi5y
No more previous content

No more next content
28 Comments
Like Comment
Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

628,016 followers 11mo
Report this post
If you’re an AI engineer, understanding how LLMs are trained and aligned is essential for building high-performance, reliable AI systems. Most large language models follow a 3-step training procedure: Step 1: Pretraining → Goal: Learn general-purpose language representations. → Method: Self-supervised learning on massive unlabeled text corpora (e.g., next-token prediction). → Output: A pretrained LLM, rich in linguistic and factual knowledge but not grounded in human preferences. → Cost: Extremely high (billions of tokens, trillions of FLOPs). → Pretraining is still centralized within a few labs due to the scale required (e.g., Meta, Google DeepMind, OpenAI), but open-weight models like LLaMA 4, DeepSeek V3, and Qwen 3 are making this more accessible. Step 2: Finetuning (Two Common Approaches) → 2a: Full-Parameter Finetuning - Updates all weights of the pretrained model. - Requires significant GPU memory and compute. - Best for scenarios where the model needs deep adaptation to a new domain or task. - Used for: Instruction-following, multilingual adaptation, industry-specific models. - Cons: Expensive, storage-heavy. → 2b: Parameter-Efficient Finetuning (PEFT) - Only a small subset of parameters is added and updated (e.g., via LoRA, Adapters, or IA³). - Base model remains frozen. - Much cheaper, ideal for rapid iteration and deployment. - Multi-LoRA architectures (e.g., used in Fireworks AI, Hugging Face PEFT) allow hosting multiple finetuned adapters on the same base model, drastically reducing cost and latency for serving. Step 3: Alignment (Usually via RLHF) Pretrained and task-tuned models can still produce unsafe or incoherent outputs. Alignment ensures they follow human intent. Alignment via RLHF (Reinforcement Learning from Human Feedback) involves: → Step 1: Supervised Fine-Tuning (SFT) - Human labelers craft ideal responses to prompts. - Model is fine-tuned on this dataset to mimic helpful behavior. - Limitation: Costly and not scalable alone. → Step 2: Reward Modeling (RM) - Humans rank multiple model outputs per prompt. - A reward model is trained to predict human preferences. - This provides a scalable, learnable signal of what “good” looks like. → Step 3: Reinforcement Learning (e.g., PPO, DPO) - The LLM is trained using the reward model’s feedback. - Algorithms like Proximal Policy Optimization (PPO) or newer Direct Preference Optimization (DPO) are used to iteratively improve model behavior. - DPO is gaining popularity over PPO for being simpler and more stable without needing sampled trajectories. Key Takeaways: → Pretraining = general knowledge (expensive) → Finetuning = domain or task adaptation (customize cheaply via PEFT) → Alignment = make it safe, helpful, and human-aligned (still labor-intensive but improving) Save the visual reference, and follow me (Aishwarya Srinivasan) for more no-fluff AI insights ❤️ PS: Visual inspiration: Sebastian Raschka, PhD
No more previous content

No more next content
33 Comments
Like Comment
Kuldeep Singh Sidhu

Senior Data Scientist @ Walmart | BITS Pilani

16,024 followers 1y
Report this post
Exciting New Research: Injecting Domain-Specific Knowledge into Large Language Models I just came across a fascinating comprehensive survey on enhancing Large Language Models (LLMs) with domain-specific knowledge. While LLMs like GPT-4 have shown remarkable general capabilities, they often struggle with specialized domains such as healthcare, chemistry, and legal analysis that require deep expertise. The researchers (Song, Yan, Liu, and colleagues) have systematically categorized knowledge injection methods into four key paradigms: 1. Dynamic Knowledge Injection - This approach retrieves information from external knowledge bases in real-time during inference, combining it with the input for enhanced reasoning. It offers flexibility and easy updates without retraining, though it depends heavily on retrieval quality and can slow inference. 2. Static Knowledge Embedding - This method embeds domain knowledge directly into model parameters through fine-tuning. PMC-LLaMA, for instance, extends LLaMA 7B by pretraining on 4.9 million PubMed Central articles. While offering faster inference without retrieval steps, it requires costly updates when knowledge changes. 3. Modular Knowledge Adapters - These introduce small, trainable modules that plug into the base model while keeping original parameters frozen. This parameter-efficient approach preserves general capabilities while adding domain expertise, striking a balance between flexibility and computational efficiency. 4. Prompt Optimization - Rather than retrieving external knowledge, this technique focuses on crafting prompts that guide LLMs to leverage their internal knowledge more effectively. It requires no training but depends on careful prompt engineering. The survey also highlights impressive domain-specific applications across biomedicine, finance, materials science, and human-centered domains. For example, in biomedicine, domain-specific models like PMC-LLaMA-13B significantly outperform general models like LLaMA2-70B by over 10 points on the MedQA dataset, despite having far fewer parameters. Looking ahead, the researchers identify key challenges including maintaining knowledge consistency when integrating multiple sources and enabling cross-domain knowledge transfer between distinct fields with different terminologies and reasoning patterns. This research provides a valuable roadmap for developing more specialized AI systems that combine the broad capabilities of LLMs with the precision and depth required for expert domains. As we continue to advance AI systems, this balance between generality and specialization will be crucial.
No more previous content

No more next content
Like Comment
Sumeet Agrawal

Vice President of Product Management

9,697 followers 6mo
Report this post
Ever wondered how Large Language Models (LLMs) like ChatGPT actually learn to talk like humans? It all comes down to a multi-stage training process - from raw data learning to human feedback fine-tuning. Here’s a quick breakdown of the 4 Stages of LLM Training: Stage 0: Untrained LLM At this stage, the model produces random outputs — it has no understanding of language yet. Stage 1: Pre-training The model learns from massive text datasets, recognizing language patterns and structure - but it’s still not conversational. Stage 2: Instruction Fine-Tuning Now, it’s trained on question–answer pairs to follow instructions and provide more useful, context-aware responses. Stage 3: Reinforcement Learning from Human Feedback (RLHF) The model learns to rank responses based on human preference, improving response quality and helpfulness. Stage 4: Reasoning Fine-Tuning Finally, the model is trained on reasoning and logic tasks, refining its ability to produce factual and well-structured answers. Understanding how LLMs evolve helps you build, prompt, and use them better.
No more previous content

No more next content
25 Comments
Like Comment
Sohrab Rahimi

Director, AI/ML Lead @ Google

23,609 followers 10mo
Report this post
Reinforcement learning (RL) is becoming a core strategy for improving how language models reason. Historically, RL in LLMs was used at the final stage of training to align model behavior with human preferences. That helped models sound more helpful or polite, but it did not expand their ability to solve complex problems. RL is now being applied earlier and more deeply, not just to tune outputs, but to help models learn how to think, adapt, and generalize across different kinds of reasoning challenges. Here are 3 papers that stood out. 𝗣𝗿𝗼𝗥𝗟 (𝗡𝗩𝗜𝗗𝗜𝗔) applies RL over longer time horizons using strategies like entropy control, KL regularization, and reference policy resets. A 1.5B model trained with this setup outperforms much larger models on tasks like math, code, logic, and scientific reasoning. What is more interesting is that the model begins to solve problems it had not seen before, suggesting that RL, when structured and sustained, can unlock new reasoning capabilities that pretraining alone does not reach. 𝗥𝗲𝗳𝗹𝗲𝗰𝘁, 𝗥𝗲𝘁𝗿𝘆, 𝗥𝗲𝘄𝗮𝗿𝗱 𝗯𝘆 𝗪𝗿𝗶𝘁𝗲𝗿 𝗜𝗻𝗰. introduces a lightweight self-improvement loop. When the model fails a task, it generates a reflection, retries, and is rewarded only if the new attempt succeeds. Over time, the model learns to write better reflections and improves even on first-try accuracy. Because it relies only on a binary success signal and needs no human-labeled data, it provides a scalable way for models to self-correct. 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗣𝗿𝗲-𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗮𝗻𝗱 𝗣𝗲𝗸𝗶𝗻𝗴 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗶𝘁𝘆 proposes a new way to pretrain models. Rather than predicting the next token as passive pattern completion, each token prediction is treated as a decision with a reward tied to correctness. This turns next-token prediction into a large-scale RL process. It results in better generalization, especially on hard tokens, and shows that reasoning ability can be built into the model from the earliest stages of training without relying on curated prompts or handcrafted rewards. Taken together, these papers show that RL is becoming a mechanism for growing the model’s ability to reflect, generalize, and solve problems it was not explicitly trained on. What they also reveal is that the most meaningful improvements come from training at moments of uncertainty. Instead of compressing more knowledge into a frozen model, we are beginning to train systems that can learn how to improve mid-process and build reasoning as a capability. This changes how we think about scaling. The next generation of progress may not come from larger models, but from models that are better at learning through feedback, self-reflection, and structured trial and error.
No more previous content

No more next content
3 Comments
Like Comment
Vick Mahase PharmD, PhD.

AI/ML Solutions Architect

2,196 followers 1y
Report this post
The paper introduces RHO-1, a new language model that takes a smarter approach to training by focusing only on the most important data. Using a method called Selective Language Modeling (SLM), RHO-1 targets specific tokens aligned with a desired distribution instead of treating all tokens in a dataset equally. The researchers dug into how language models learn, analyzing token-level training patterns and grouping tokens based on their loss trends. From there, they developed SLM, which scores tokens with a reference model and prioritizes the higher-scoring ones during training. And the results? Pretty impressive! When pre-trained on a math-focused dataset, RHO-1 showed major improvements in few-shot accuracy on math tasks. On a general dataset, it excelled in general tasks too. After fine-tuning, it even achieved state-of-the-art results on the MATH dataset. This study highlights how narrowing the focus during training can make models more efficient and powerful. By training smarter—not harder—this approach could lead to language models that perform better while needing less data and compute power.

3 Comments
Like Comment
Hao Hoang

Daily AI Interview Questions | Senior AI Researcher & Engineer | ML, LLMs, NLP, DL, CV, ML Systems | 56k+ AI Community

55,189 followers 7mo
Report this post
𝘛𝘩𝘦 𝘈𝘐 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨 𝘱𝘭𝘢𝘺𝘣𝘰𝘰𝘬 𝘳𝘦𝘭𝘪𝘦𝘴 𝘰𝘯 𝘮𝘢𝘴𝘴𝘪𝘷𝘦 𝘥𝘢𝘵𝘢𝘴𝘦𝘵𝘴 𝘢𝘯𝘥 𝘤𝘰𝘮𝘱𝘭𝘦𝘹 𝘱𝘪𝘱𝘦𝘭𝘪𝘯𝘦𝘴. 𝘞𝘩𝘢𝘵 𝘪𝘧 𝘪𝘵'𝘴 𝘧𝘶𝘯𝘥𝘢𝘮𝘦𝘯𝘵𝘢𝘭𝘭𝘺 𝘪𝘯𝘦𝘧𝘧𝘪𝘤𝘪𝘦𝘯𝘵? 𝘕𝘦𝘸 𝘳𝘦𝘴𝘦𝘢𝘳𝘤𝘩 𝘧𝘳𝘰𝘮 Princeton University 𝘴𝘶𝘨𝘨𝘦𝘴𝘵𝘴 𝘵𝘩𝘢𝘵 𝘧𝘰𝘳𝘤𝘪𝘯𝘨 𝘮𝘰𝘥𝘦𝘭𝘴 𝘵𝘰 "𝘵𝘩𝘪𝘯𝘬 𝘰𝘶𝘵 𝘭𝘰𝘶𝘥" 𝘪𝘴 𝘢 𝘴𝘩𝘰𝘳𝘵𝘤𝘶𝘵 𝘵𝘰 𝘦𝘭𝘪𝘵𝘦 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦, 𝘵𝘶𝘳𝘯𝘪𝘯𝘨 𝘢𝘯 8𝘉 𝘮𝘰𝘥𝘦𝘭 𝘪𝘯𝘵𝘰 𝘢 𝘎𝘗𝘛-4𝘰 𝘤𝘰𝘮𝘱𝘦𝘵𝘪𝘵𝘰𝘳 𝘸𝘪𝘵𝘩 𝘢 𝘧𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘰𝘧 𝘵𝘩𝘦 𝘥𝘢𝘵𝘢. The current approach to building top-tier models is incredibly resource-intensive and often opaque. Finding a more efficient path to high-level reasoning could democratize AI development and lead to more capable, transparent systems. A new paper, "𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐭𝐡𝐚𝐭 𝐓𝐡𝐢𝐧𝐤, 𝐂𝐡𝐚𝐭 𝐁𝐞𝐭𝐭𝐞𝐫," tackles a core issue: models trained for verifiable reasoning (like math) don't generalize well to creative, open-ended conversation. Their solution is RLMT (Reinforcement Learning with Model-rewarded Thinking). It forces a language model to generate a detailed chain-of-thought plan before its final response. This entire process is optimized using online reinforcement learning against a preference reward model, effectively teaching the model how to reason on any topic, not just what answer to give. The results: - Their Llama-3.1-8B model trained with RLMT outperforms GPT-4o on the WildBench chat benchmark. - A base Llama-3.1-8B model trained with RLMT on just 7,000 prompts surpassed the official Llama-3.1-8B-Instruct model, which used a complex pipeline with over 25 million examples. #AI #MachineLearning #LLM #Research #DeepLearning #ReinforcementLearning

1 Comment
Like Comment
Ahsen Khaliq

ML @ Hugging Face

36,022 followers 1y
Report this post
Stacking Your Transformers A Closer Look at Model Growth for Efficient LLM Pre-Training LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical textit{O}bstacles: (O1) lack of comprehensive evaluation, (O2) untested viability for scaling, and (O3) lack of empirical guidelines. To tackle O1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting. Our findings reveal that a depthwise stacking operator, called G_{stack}, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance on eight standard NLP benchmarks compared to strong baselines. Motivated by these promising results, we conduct extensive experiments to delve deeper into G_{stack} to address O2 and O3. For O2 (untested scalability), our study shows that G_{stack} is scalable and consistently performs well, with experiments up to 7B LLMs after growth and pre-training LLMs with 750B tokens. For example, compared to a conventionally trained 7B model using 300B tokens, our G_{stack} model converges to the same loss with 194B tokens, resulting in a 54.6\% speedup. We further address O3 (lack of empirical guidelines) by formalizing guidelines to determine growth timing and growth factor for G_{stack}, making it practical in general LLM pre-training. We also provide in-depth discussions and comprehensive ablation studies of G_{stack}.
No more previous content

No more next content
2 Comments
Like Comment
Ankit Agarwal

Founder | CEO | Gen AI Board Advisor | Investor | Ex-Amazon

16,893 followers 1y
Report this post
𝗗𝗲𝗲𝗽 𝗗𝗶𝘃𝗲 𝗶𝗻𝘁𝗼 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗟𝗮𝗿𝗴𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 Very enlightening paper authored by a team of researchers specializing in computer vision and NLP, this survey underscores that pretraining—while fundamental—only sets the stage for LLM capabilities. The paper then highlights 𝗽𝗼𝘀𝘁-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺𝘀 (𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴, 𝗿𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, 𝗮𝗻𝗱 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴) as the real game-changer for aligning LLMs with complex real-world needs. It offers: ◼️ A structured taxonomy of post-training techniques ◼️ Guidance on challenges such as hallucinations, catastrophic forgetting, reward hacking, and ethics ◼️ Future directions in model alignment and scalable adaptation In essence, it’s a playbook for making LLMs truly robust and user-centric. 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 𝗕𝗲𝘆𝗼𝗻𝗱 𝗩𝗮𝗻𝗶𝗹𝗹𝗮 𝗠𝗼𝗱𝗲𝗹𝘀 While raw pretrained LLMs capture broad linguistic patterns, they may lack domain expertise or the ability to follow instructions precisely. Targeted fine-tuning methods—like Instruction Tuning and Chain-of-Thought Tuning—unlock more specialized, high-accuracy performance for tasks ranging from creative writing to medical diagnostics. 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 The authors show how RL-based methods (e.g., RLHF, DPO, GRPO) turn human or AI feedback into structured reward signals, nudging LLMs toward higher-quality, less toxic, or more logically sound outputs. This structured approach helps mitigate “hallucinations” and ensures models better reflect human values or domain-specific best practices. ⭐ 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀 ◾ 𝗥𝗲𝘄𝗮𝗿𝗱 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 𝗜𝘀 𝗞𝗲𝘆: Rather than using absolute numerical scores, ranking-based feedback (e.g., pairwise preferences or partial ordering of responses) often gives LLMs a crisper, more nuanced way to learn from human annotations. Process vs. Outcome Rewards: It’s not just about the final answer; rewarding each step in a chain-of-thought fosters transparency and better “explainability.” ◾ 𝗠𝘂𝗹𝘁𝗶-𝗦𝘁𝗮𝗴𝗲 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴: The paper discusses iterative techniques that combine RL, supervised fine-tuning, and model distillation. This multi-stage approach lets a single strong “teacher” model pass on its refined skills to smaller, more efficient architectures—democratizing advanced capabilities without requiring massive compute. ◾ 𝗣𝘂𝗯𝗹𝗶𝗰 𝗥𝗲𝗽𝗼𝘀𝗶𝘁𝗼𝗿𝘆: The authors maintain a GitHub repo tracking the rapid developments in LLM post-training—great for staying up-to-date on the latest papers and benchmarks. Source : https://lnkd.in/gTKW4Jdh ☃ To continue getting such interesting Generative AI content/updates : https://lnkd.in/gXHP-9cW #GenAI #LLM #AI RealAIzation
No more previous content

No more next content
Like Comment
Asif Razzaq

Founder @ Marktechpost (AI Dev News Platform) | 1 Million+ Monthly Readers

35,056 followers 1y
Report this post
CMU Researchers Introduce PAPRIKA: A Fine-Tuning Approach that Enables Language Models to Develop General Decision-Making Capabilities Not Confined to Particular Environment This method is designed to endow language models with general decision-making capabilities that are not limited to any single environment. Rather than relying on traditional training data, PAPRIKA leverages synthetic interaction data generated across a diverse set of tasks. These tasks range from classic guessing games like twenty questions to puzzles such as Mastermind and even scenarios simulating customer service interactions. By training on these varied trajectories, the model learns to adjust its behavior based on contextual feedback from its environment—without the need for additional gradient updates. This approach encourages the model to adopt a more flexible, in-context learning strategy that can be applied to a range of new tasks. PAPRIKA’s methodology is built on a two-stage fine-tuning process. The first stage involves exposing the LLM to a large set of synthetic trajectories generated using a method called Min‑p sampling, which ensures that the training data is both diverse and coherent. This step allows the model to experience a wide spectrum of interaction strategies, including both successful and less effective decision-making behaviors. The second stage refines the model using a blend of supervised fine-tuning (SFT) and a direct preference optimization (DPO) objective. In this setup, pairs of trajectories are compared, with the model gradually learning to favor those that lead more directly to task success....... Read full article: https://lnkd.in/gbqaxhzz Paper: https://lnkd.in/g7yrkpdb GitHub Page: https://lnkd.in/gNdpvK85 Model on Hugging Face: https://lnkd.in/gQvd_Vc4
No more previous content

No more next content
Like Comment

Pathways Approach to Large Language Model Training

Summary

More in Building Training Frameworks

Explore categories