An interesting new paper on LLM-JEPA from Hai Huang, Yann LeCun, and Randall Balestriero. 💡 Previously I wrote about JEPA approach applied to videos (V-JEPA and V-JEPA 2) and time series (CHARM). Now the JEPA approach is finally applied to LLMs! This work bridges a major gap between AI for vision and language, offering a potential leap forward in how we train language models. Instead of just predicting the next word, LLM-JEPA teaches models to understand the underlying meaning by predicting abstract concepts (as JEPA approach does)—for instance, grasping the essence of a code snippet from its natural language description. The paper introduces a hybrid objective combining standard next-token prediction with a Joint Embedding Predictive Architecture (JEPA) loss, a technique highly successful in computer vision. The empirical results are compelling: LLM-JEPA consistently boosts performance, accelerates parameter-efficient fine-tuning (PEFT), and shows remarkable resistance to overfitting. This method doesn't just improve scores; it fundamentally creates more structured and transferable representations. While the current computational overhead is a challenge to address, this paper opens a promising new direction beyond traditional LLM training. 🚀 Review: https://lnkd.in/eC4Jte_r Paper: https://lnkd.in/erZJadb3 Code: https://lnkd.in/ethXT7sX
LLM-JEPA for Transferable AI Representations
Explore top LinkedIn content from expert professionals.
Summary
LLM-JEPA is a new training method for large language models that uses a Joint Embedding Predictive Architecture (JEPA) to create more general and transferable AI representations, going beyond simply predicting the next word. This approach helps language models understand underlying concepts and abstract relationships, making them more adaptable and structured for a wider range of tasks.
- Try JEPA-based training: Experiment with JEPA loss during pretraining and fine-tuning to see improvements in accuracy and resilience against overfitting.
- Explore broad applications: Use LLM-JEPA across various datasets and tasks such as question answering, reasoning, and generative tasks to achieve more reliable results.
- Utilize open-source tools: Take advantage of available code and resources to test LLM-JEPA on your own projects and contribute to this evolving AI technology.
-
-
🚀 𝘃𝟮 𝗼𝗳 𝗼𝘂𝗿 𝗽𝗮𝗽𝗲𝗿 “𝗟𝗟𝗠-𝗝𝗘𝗣𝗔” 𝗶𝘀 𝗼𝘂𝘁 𝗼𝗻 𝗮𝗿𝗫𝗶𝘃! 🔍 𝐖𝐡𝐚𝐭’𝐬 𝐧𝐞𝐰? ✅ 𝗦𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮𝗻𝘁𝗹𝘆 𝗹𝗼𝘄𝗲𝗿 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗼𝘃𝗲𝗿𝗵𝗲𝗮𝗱 — reduced overhead from 𝟮𝟬𝟬% → 𝟮𝟱% using a simple yet effective 𝗿𝗮𝗻𝗱𝗼𝗺 𝗝𝗘𝗣𝗔-𝗹𝗼𝘀𝘀 𝗱𝗿𝗼𝗽𝗼𝘂𝘁. ✅ 𝗕𝗿𝗼𝗮𝗱𝗲𝗿 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀 — extended beyond symmetric 2-view datasets to 𝗡𝗤-𝗢𝗽𝗲𝗻 (Natural Questions for open-domain) and 𝗛𝗲𝗹𝗹𝗮𝗦𝘄𝗮𝗴 (sentence completion), and tested on reasoning models like 𝗤𝘄𝗲𝗻𝟯 and 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝗥𝟭-𝗗𝗶𝘀𝘁𝗶𝗹𝗹𝗲𝗱. ✅ 𝗥𝗶𝗴𝗼𝗿𝗼𝘂𝘀 𝗮𝗯𝗹𝗮𝘁𝗶𝗼𝗻𝘀 — JEPA loss design outperforms alternatives including 𝗟𝟮, 𝗠𝗦𝗘, 𝗽𝗿𝗲𝗽𝗲𝗻𝗱 [𝗣𝗥𝗘𝗗] 𝘁𝗼𝗸𝗲𝗻𝘀, 𝗖𝗼𝗱𝗲→𝗧𝗲𝘅𝘁, and 𝗜𝗻𝗳𝗼𝗡𝗖𝗘 variants. 🧩 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐋𝐋𝐌-𝐉𝐄𝐏𝐀? If you’re seeing this for the first time: LLM-JEPA introduces the 𝗝𝗼𝗶𝗻𝘁 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝘃𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 (𝗝𝗘𝗣𝗔) — a self-supervised learning paradigm proven in vision — as a 𝗿𝗲𝗴𝘂𝗹𝗮𝗿𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗹𝗼𝘀𝘀 for LLMs. Combined with next-token prediction, it enables models to: 🎯 Boost fine-tuning accuracy 🧠 Resist overfitting 🌱 Work in pretraining via 𝗽𝗮𝗿𝗮𝗽𝗵𝗿𝗮𝘀𝗲-𝗯𝗮𝘀𝗲𝗱 𝗝𝗘𝗣𝗔 🌀 Induce 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗹𝗮𝘁𝗲𝗻𝘁 𝗿𝗲𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻𝘀 unseen in either base or normally fine-tuned models 🧪 The 𝘃𝟭 𝘄𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝘃𝗲𝗿𝘀𝗶𝗼𝗻 (accepted to NeurIPS 2025 UniReps + DL4C) received valuable feedback highlighting high compute cost, limited applications, and missing ablations — all fully addressed in this release. Huge thanks to the UniReps and DL4C reviewers for their constructive and insightful comments that helped shape v2. It’s been a privilege to collaborate with Yann LeCun (NYU) and Randall Balestriero (Brown) — few experiences are more inspiring than working alongside the pioneers of modern deep and self-supervised learning. The 𝗰𝗼𝗱𝗲 𝗶𝘀 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱, and we warmly invite others to experiment with it — and help explore this emerging frontier between 𝗝𝗘𝗣𝗔 𝗮𝗻𝗱 𝗟𝗟𝗠𝘀. 💻 Code: https://lnkd.in/eUX2b8iE 📄 Paper: https://lnkd.in/ers8_yzm Together with Yann and Randall, we’re already exploring new variants and applications — and look forward to sharing more soon. Stay tuned!
-
LLM-JEPA enhances LLM training by combining standard next-token prediction with a joint embedding predictive architecture, yielding superior reasoning and generative performance. A JEPA-based objective preserves generative capacity while strengthening abstraction by aligning embeddings across different views, such as text and code. Using encoder, predictor, and similarity-based loss components, it efficiently integrates with existing architectures. Extensive experiments across model families like Llama, Gemma, OpenELM, and OLMo, and datasets including NL-RX, GSM8K, Spider, and RottenTomatoes, show consistent improvements in fine-tuning accuracy and training efficiency. Pretraining with LLM-JEPA further enhances downstream performance, demonstrating benefits even when JEPA is only applied during pretraining. Empirical analysis confirms that standard next-token loss does not minimize JEPA objectives, underscoring the necessity of explicitly including the JEPA term. Results indicate state-of-the-art gains, reduced error, and robust generative fidelity, positioning LLM-JEPA as a foundational step toward JEPA-centric language model development. https://lnkd.in/gbNYpe3N
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development