Top LinkedIn Content on Retrieval Augmented Generation Guide

building AI systems @meta

206,800 followers 7mo

Meta delivered a RAG rethink, and they called it REFRAG Traditional Retrieval-Augmented Generation (RAG) has a scaling problem. Most of the context we feed into LLMs during RAG is irrelevant. Worse, we process it anyway, token by token, blowing up memory and latency for minimal gain. The new Superintelligence team at Meta just proposed a fix: REFRAG. REFRAG does something deceptively simple and profoundly effective: Instead of feeding the full retrieved text, it compresses it into embeddings; before decoding. Think of it as skipping the small talk and jumping straight to the point. Why it matters: 1/ Up to 30x faster time-to-first-token than standard RAG pipelines. 2/ No loss in perplexity (a rarity with this kind of optimization). 3/ Works across multi-turn conversations, summarization, and standard RAG; all without retraining the base model. And perhaps the most interesting part? It uses a lightweight RL policy to learn which chunks need full text and which don’t. Dynamic, adaptive compression at inference time. This isn’t just a speed hack. It’s a shift in how we architect context for LLMs. More context no longer means slower models. That changes how we design systems and what we expect from them. Link to the paper: https://lnkd.in/gwsrS-H8

204 Comments

Aishwarya Srinivasan

627,898 followers 10mo

If you’re an AI engineer trying to understand and build with GenAI, RAG (Retrieval-Augmented Generation) is one of the most essential components to master. It’s the backbone of any LLM system that needs fresh, accurate, and context-aware outputs. Let’s break down how RAG works, step by step, from an engineering lens, not a hype one: 🧠 How RAG Works (Under the Hood) 1. Embed your knowledge base → Start with unstructured sources - docs, PDFs, internal wikis, etc. → Convert them into semantic vector representations using embedding models (e.g., OpenAI, Cohere, or HuggingFace models) → Output: N-dimensional vectors that preserve meaning across contexts 2. Store in a vector database → Use a vector store like Pinecone, Weaviate, or FAISS → Index embeddings to enable fast similarity search (cosine, dot-product, etc.) 3. Query comes in - embed that too → The user prompt is embedded using the same embedding model → Perform a top-k nearest neighbor search to fetch the most relevant document chunks 4. Context injection → Combine retrieved chunks with the user query → Format this into a structured prompt for the generation model (e.g., Mistral, Claude, Llama) 5. Generate the final output → LLM uses both the query and retrieved context to generate a grounded, context-rich response → Minimizes hallucinations and improves factuality at inference time 📚 What changes with RAG? Without RAG: 🧠 “I don’t have data on that.” With RAG: 🤖 “Based on [retrieved source], here’s what’s currently known…” Same model, drastically improved quality. 🔍 Why this matters You need RAG when: → Your data changes daily (support tickets, news, policies) → You can’t afford hallucinations (legal, finance, compliance) → You want your LLMs to access your private knowledge base without retraining It’s the most flexible, production-grade approach to bridge static models with dynamic information. 🛠️ Arvind and I are kicking off a hands-on workshop on RAG This first session is designed for beginner to intermediate practitioners who want to move beyond theory and actually build. Here’s what you’ll learn: → How RAG enhances LLMs with real-time, contextual data → Core concepts: vector DBs, indexing, reranking, fusion → Build a working RAG pipeline using LangChain + Pinecone → Explore no-code/low-code setups and real-world use cases If you're serious about building with LLMs, this is where you start. 📅 Save your seat and join us live: https://lnkd.in/gS_B7_7d

134 Comments

Brij kishore Pandey

AI Architect & Engineer | AI Strategist

720,630 followers 1y

In the world of Generative AI, 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹-𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (𝗥𝗔𝗚) is a game-changer. By combining the capabilities of LLMs with domain-specific knowledge retrieval, RAG enables smarter, more relevant AI-driven solutions. But to truly leverage its potential, we must follow some essential 𝗯𝗲𝘀𝘁 𝗽𝗿𝗮𝗰𝘁𝗶𝗰𝗲𝘀: 1️⃣ 𝗦𝘁𝗮𝗿𝘁 𝘄𝗶𝘁𝗵 𝗮 𝗖𝗹𝗲𝗮𝗿 𝗨𝘀𝗲 𝗖𝗮𝘀𝗲 Define your problem statement. Whether it’s building intelligent chatbots, document summarization, or customer support systems, clarity on the goal ensures efficient implementation. 2️⃣ 𝗖𝗵𝗼𝗼𝘀𝗲 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗕𝗮𝘀𝗲 - Ensure your knowledge base is 𝗵𝗶𝗴𝗵-𝗾𝘂𝗮𝗹𝗶𝘁𝘆, 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱, 𝗮𝗻𝗱 𝘂𝗽-𝘁𝗼-𝗱𝗮𝘁𝗲. - Use vector embeddings (e.g., pgvector in PostgreSQL) to represent your data for efficient similarity search. 3️⃣ 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗠𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺𝘀 - Use hybrid search techniques (semantic + keyword search) for better precision. - Tools like 𝗽𝗴𝗔𝗜, 𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲, or 𝗣𝗶𝗻𝗲𝗰𝗼𝗻𝗲 can enhance retrieval speed and accuracy. 4️⃣ 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗲 𝗬𝗼𝘂𝗿 𝗟𝗟𝗠 (𝗢𝗽𝘁𝗶𝗼𝗻𝗮𝗹) - If your use case demands it, fine-tune the LLM on your domain-specific data for improved contextual understanding. 5️⃣ 𝗘𝗻𝘀𝘂𝗿𝗲 𝗦𝗰𝗮𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆 - Architect your solution to scale. Use caching, indexing, and distributed architectures to handle growing data and user demands. 6️⃣ 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝗮𝗻𝗱 𝗜𝘁𝗲𝗿𝗮𝘁𝗲 - Continuously monitor performance using metrics like retrieval accuracy, response time, and user satisfaction. - Incorporate feedback loops to refine your knowledge base and model performance. 7️⃣ 𝗦𝘁𝗮𝘆 𝗦𝗲𝗰𝘂𝗿𝗲 𝗮𝗻𝗱 𝗖𝗼𝗺𝗽𝗹𝗶𝗮𝗻𝘁 - Handle sensitive data responsibly with encryption and access controls. - Ensure compliance with industry standards (e.g., GDPR, HIPAA). With the right practices, you can unlock its full potential to build powerful, domain-specific AI applications. What are your top tips or challenges?

33 Comments

Anurag(Anu) Karuparti

31,501 followers 5mo

𝐑𝐀𝐆 𝐥𝐨𝐨𝐤𝐬 𝐬𝐢𝐦𝐩𝐥𝐞 𝐮𝐧𝐭𝐢𝐥 𝐲𝐨𝐮 𝐬𝐞𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐬𝐲𝐬𝐭𝐞𝐦𝐬. Here is the 7-stage blueprint that makes high-accuracy retrieval possible: 𝟏. 𝐐𝐮𝐞𝐫𝐲 𝐓𝐫𝐚𝐧𝐬𝐥𝐚𝐭𝐢𝐨𝐧. The system rewrites the user question into forms that are easier for retrieval. It can reframe, decompose, expand, or turn questions into hypothetical documents. This improves search quality from the first step. 𝟐. 𝐑𝐨𝐮𝐭𝐢𝐧𝐠. The system decides where the query should go. Logical routing lets the model choose the right database. Semantic routing embeds the question and selects the best prompt or route based on similarity. 𝟑. 𝐐𝐮𝐞𝐫𝐲 𝐂𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧. Different databases need different query formats. Text to SQL for relational DBs, text to Cypher for graph DBs, and metadata based self query for vector DBs help build optimized queries automatically. 𝟒. 𝐈𝐧𝐝𝐞𝐱𝐢𝐧𝐠. Information is prepared for retrieval. Semantic splitting creates meaningful chunks. Multi representation indexing stores both original and summarized content. Specialized embeddings handle domain specific meaning. Hierarchical indexing builds summary trees across multiple abstraction levels. 𝟓. 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥. The system ranks, filters, and refines documents. Re Rank, RankGPT, and RAG Fusion improve relevance. CRAG combines active retrieval and refinement to discard weak results and fetch new data if needed. 𝟔. 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧. The model synthesizes the final answer using retrieved context. Self RAG and similar techniques check answer quality, rewrite questions, and re retrieve missing info. 𝟕. 𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤 𝐋𝐨𝐨𝐩. If the answer lacks depth or relevance, the system loops back to retrieval or query rewriting. This ensures the final output is complete and grounded. This blueprint shows how modern RAG systems deliver accuracy, reliability, and factual responses at scale. 𝐖𝐡𝐢𝐜𝐡 𝐬𝐭𝐚𝐠𝐞 𝐨𝐟 𝐭𝐡𝐢𝐬 𝐑𝐀𝐆 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐬𝐡𝐨𝐮𝐥𝐝 𝐈 𝐜𝐨𝐧𝐯𝐞𝐫𝐭 𝐢𝐧𝐭𝐨 𝐚 𝐝𝐞𝐞𝐩𝐞𝐫 𝐬𝐭𝐞𝐩 𝐛𝐲 𝐬𝐭𝐞𝐩 𝐠𝐮𝐢𝐝𝐞 𝐧𝐞𝐱𝐭? ♻️ Repost this to help your network get started ➕ Follow Anurag(Anu) Karuparti for more PS: If you found this valuable, join my weekly newsletter where I document the real-world journey of AI transformation. ✉️ Free subscription: https://lnkd.in/esF52fm5 #RAG #AIagents #LLMEngineering #AIsystems

63 Comments

Cornellius Y.

Data Scientist & AI Engineer | Data Insight | Helping Orgs Scale with Data

44,000 followers 11mo

𝐑𝐀𝐆 𝐢𝐬 𝐬𝐢𝐦𝐩𝐥𝐞—𝐮𝐧𝐭𝐢𝐥 𝐲𝐨𝐮 𝐭𝐫𝐲 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐢𝐭. Here's how I'd learn it from zero again (minus the rabbit holes): 🧠 𝑺𝒕𝒂𝒓𝒕 𝒘𝒊𝒕𝒉 𝒕𝒉𝒆 𝒘𝒉𝒚 RAG = Retrieval-Augmented Generation. It connects LLMs with real-time information using their knowledge base to avoid hallucinations. 🔧 𝑳𝒆𝒂𝒓𝒏 𝒕𝒉𝒆 𝒄𝒐𝒓𝒆 𝒃𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒃𝒍𝒐𝒄𝒌𝒔 • Retriever → Finds the most relevant chunks of data. • Generator → Crafts a smart answer using those chunks. • Vector DB → Stores your knowledge in a searchable, semantic way. Understanding these 3 roles early = 50% of the game. ⚙️ 𝑷𝒊𝒄𝒌 𝒕𝒐𝒐𝒍𝒔 𝒕𝒉𝒂𝒕 𝒉𝒆𝒍𝒑 𝒚𝒐𝒖 𝒕𝒉𝒊𝒏𝒌, 𝒏𝒐𝒕 𝒋𝒖𝒔𝒕 𝒃𝒖𝒊𝒍𝒅 • LangChain & Haystack for structure. • FAISS or Pinecone for vector search. • Sentence Transformers for embeddings. The tools are less important than understanding what each part is doing. 📚 𝑫𝒐𝒏’𝒕 𝒄𝒐𝒍𝒍𝒆𝒄𝒕 𝒅𝒂𝒕𝒂. 𝑪𝒖𝒓𝒂𝒕𝒆 𝒊𝒕. • Chunk long docs — smaller = better retrieval. • Embed with care — garbage in, garbage vectors out. • Store smart — test your indexing early. ✍️ 𝑷𝒓𝒐𝒎𝒑𝒕𝒊𝒏𝒈 𝒊𝒔 𝒘𝒉𝒆𝒓𝒆 𝒊𝒕 𝒊𝒔 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 Once you retrieve context, you frame the question. • Bad prompt = wasted context. • Good prompt = real augmentation. 🧪 𝑻𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒔𝒔𝒊𝒗𝒆𝒍𝒚. 𝑹𝒆𝒃𝒖𝒊𝒍𝒅 𝒎𝒆𝒓𝒄𝒊𝒍𝒆𝒔𝒔𝒍𝒚. You'll break things, and your results will be weird. But with every mistake, your mental model sharpens. • Use relevant Metrics like Context Precision or Context Recall • Monitor your RAG pipeline with Langsmith or Opik I'm not learning RAG to build flashy demos. I’m learning it to build systems that know things I care about. Here are a few Free Courses you can use to boost your RAG learning: 👉𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 𝐟𝐨𝐫 𝐋𝐋𝐌 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭: https://lnkd.in/ddyyTcJU 👉𝐋𝐞𝐚𝐫𝐧 𝐑𝐀𝐆 𝐅𝐫𝐨𝐦 𝐒𝐜𝐫𝐚𝐭𝐜𝐡 (𝐟𝐫𝐞𝐞𝐂𝐨𝐝𝐞𝐂𝐚𝐦𝐩.𝐨𝐫𝐠 – 𝐘𝐨𝐮𝐓𝐮𝐛𝐞 𝐯𝐢𝐝𝐞𝐨): https://lnkd.in/diWyhtRQ 👉𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐭𝐨 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐞𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 (𝐑𝐀𝐆): https://lnkd.in/d-TMR2kf 👉𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 𝐆𝐫𝐚𝐩𝐡𝐬 𝐟𝐨𝐫 𝐑𝐀𝐆: https://lnkd.in/dREckUmB 👉𝐑𝐀𝐆++ : 𝐅𝐫𝐨𝐦 𝐏𝐎𝐂 𝐭𝐨 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧: https://lnkd.in/gK6nBp8M 👉𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 𝐀𝐜𝐚𝐝𝐞𝐦𝐲: https://lnkd.in/d5wwsJPK 👉𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐌𝐨𝐝𝐞𝐥𝐬 𝐚𝐧𝐝 𝐁𝐄𝐑𝐓 𝐌𝐨𝐝𝐞𝐥: https://lnkd.in/dHP2kUrK 👉𝐑𝐀𝐆-𝐓𝐨-𝐊𝐧𝐨𝐰: https://lnkd.in/gQqqQd2a I hope it has helped!

30 Comments

Ravit Jain

169,169 followers 11mo

RAG just got smarter. If you’ve been working with Retrieval-Augmented Generation (RAG), you probably know the basic setup: An LLM retrieves documents based on a query and uses them to generate better, grounded responses. But as use cases get more complex, we need more advanced retrieval strategies—and that’s where these four techniques come in: Self-Query Retriever Instead of relying on static prompts, the model creates its own structured query based on metadata. Let’s say a user asks: “What are the reviews with a score greater than 7 that say bad things about the movie?” This technique breaks that down into query + filter logic, letting the model interact directly with structured data (like Chroma DB) using the right filters. Parent Document Retriever Here, retrieval happens in two stages: 1. Identify the most relevant chunks 2. Pull in their parent documents for full context This ensures you don’t lose meaning just because information was split across small segments. Contextual Compression Retriever (Reranker) Sometimes the top retrieved documents are… close, but not quite right. This reranker pulls the top K (say 4) documents, then uses a transformer + reranker (like Cohere) to compress and re-rank the results based on both query and context—keeping only the most relevant bits. Multi-Vector Retrieval Architecture Instead of matching a single vector per document, this method breaks both queries and documents into multiple token-level vectors using models like ColBERT. The retrieval happens across all vectors—giving you higher recall and more precise results for dense, knowledge-rich tasks. These aren’t just fancy tricks. They solve real-world problems like: • “My agent’s answer missed part of the doc.” • “Why is the model returning irrelevant data?” • “How can I ground this LLM more effectively in enterprise knowledge?” As RAG continues to scale, these kinds of techniques are becoming foundational. So if you’re building search-heavy or knowledge-aware AI systems, it’s time to level up beyond basic retrieval. Which of these approaches are you most excited to experiment with? #ai #agents #rag #theravitshow

11 Comments

Sarthak Rastogi

AI engineer | Posts on agents + advanced RAG | Experienced in LLM research, ML engineering, Software Engineering

25,230 followers 10mo

Your RAG app is NOT going to be usable in production (especially at large enterprises) if you overlook these evaluation steps -- - Before anything else, FIRST create a comprehensive evaluation dataset by writing queries that match real production use cases. - Evaluate retriever performance with non-rank metrics like Recall@k (how many relevant chunks are found in top-k results) and Precision@k (what fraction of retrieved chunks are actually relevant). These show if the right content is being found regardless of order :) - Assess retriever ranking quality with rank-based metrics including: 1. MRR (position of first relevant chunk) 2. MAP (considers all relevant chunks and their ranks) 3. NDCG (compares actual ranking to ideal ranking) These measure how well your relevant content is prioritized. - Measure generator citation performance by designing prompts that request explicit citations like [1], [2] or source sections. Calculate citation Recall@k (relevant chunks that were actually cited) and citation Precision@k (cited chunks that are actually relevant). - Evaluate response quality with quantitative metrics like F1 score at token level by tokenising both generated and ground truth responses. - Apply qualitative assessment across key dimensions including completeness (fully answers query), relevancy (answer matches question), harmfulness (potential for harm through errors), and consistency (aligns with provided chunks). Finally, with your learnings from the eval results, you can implement systematic optimisation in three sequential stages: 1. pre-processing (chunking, embeddings, query rewriting) 2. processing (retrieval algorithms, LLM selection, prompts) 3. post-processing (safety checks, formatting). With the right evaluation strategies and metrics in place, you can drastically enhance the performance and reliability of RAG systems :) Link to a the brilliant article by Ankit Vyas from neptune.ai on how to implement these steps: https://lnkd.in/guDnkdMT #RAG #AIAgents #GenAI

8 Comments

Kuldeep Singh Sidhu

Senior Data Scientist @ Walmart | BITS Pilani

16,024 followers 7mo

RAG Systems Under Fire: New Research Exposes Critical Query Robustness Issues Retrieval-Augmented Generation (RAG) systems have become the go-to solution for grounding large language models in external knowledge, but groundbreaking research from Technical University of Munich and Intel Labs reveals a concerning vulnerability that could impact production deployments worldwide. >> The Hidden Weakness The study demonstrates that RAG systems exhibit significant performance degradation when faced with seemingly minor query variations - something as simple as a typo or slight rewording can dramatically impact retrieval accuracy and final answer quality. >> Technical Deep Dive The research team conducted over 1,092 experiments across multiple components: Retriever Analysis: Dense retrievers like BGE-base-en-v1.5 and Contriever showed superior robustness against redundant information compared to sparse methods like BM25, but struggled more with typographical errors. The study revealed that BM25's token-based matching actually provided better resilience to character-level perturbations. Generator Robustness: The team evaluated three 7-8B parameter models (Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.2, and Qwen2.5-7B-Instruct) under two critical scenarios - "closed-book" (parametric knowledge only) and "oracle" (perfect retrieval). Interestingly, models showed different sensitivities in RAG contexts compared to standalone evaluation. Pipeline Correlation Analysis: Using Pearson correlation coefficients, researchers discovered that performance bottlenecks shift between retriever and generator depending on perturbation type and dataset domain. For domain-specific datasets like BioASQ, generator limitations became more pronounced with ambiguous queries. >> Under the Hood: The Evaluation Framework The methodology introduces five perturbation categories: - Redundancy insertion via GPT-4o prompting - Formal tone changes - Ambiguity introduction - Typo simulation at 10% and 25% word corruption levels using TextAttack's QWERTY keyboard proximity model Each original query generated five perturbed variants, tested across different corpus sizes (2.68M to 14.91M documents) and question types (single-hop, multi-hop, domain-specific). >> Key Technical Findings The research reveals that retriever performance trends predominantly drive end-to-end RAG outcomes, particularly for general-domain datasets. However, domain-specific scenarios show increased generator sensitivity, especially with redundant information causing "drastic performance drops" in biomedical contexts. Internal LLM representation analysis using PCA visualization showed that query perturbations scatter hidden states even when golden documents are provided, indicating fundamental challenges in query understanding robustness. The work establishes crucial benchmarks for evaluating RAG robustness and offers a systematic approach for identifying vulnerable components in existing pipelines.

1 Comment

Sneha Vijaykumar

Data Scientist @ Takeda | Ex-Shell | Gen AI | LLM | RAG | AI Agents | Azure | NLP | AWS

25,180 followers 2w

Your RAG pipeline is only as good as what it retrieves. And that’s exactly where most RAG chatbots quietly fail. You’re in a GenAI discussion, and someone asks: “Why does traditional RAG sometimes give confident but wrong answers?” RAG (Retrieval-Augmented Generation) assumes that the retrieved context is relevant and sufficient. But in reality, retrieval can be noisy, incomplete, or just plain wrong. And once bad context enters the pipeline, the LLM doesn’t question it. It just builds on top of it. That’s where Corrective RAG (CRAG) changes the game. What goes wrong in traditional RAG? 📍Retrieval returns low-quality or irrelevant documents 📍No mechanism to validate context before generation 📍LLM blindly trusts retrieved chunks Result → hallucinations with high confidence What CRAG does differently👇 CRAG introduces a correction layer between retrieval and generation. Instead of assuming retrieval is correct, it asks: 👉 “Is this context actually useful?” It does this through: 1. Retrieval Evaluation A lightweight evaluator (often a smaller model) scores the quality of retrieved documents. 2. Conditional Flow If retrieval is good → proceed as usual If retrieval is bad → trigger corrective actions 3. Corrective Actions Re-retrieve using refined queries Perform web search or external lookup Filter out noisy chunks Decompose the query for better context Traditional RAG is retrieve → generate CRAG is retrieve → evaluate → correct → generate #ai #rag #chatbot #retrieval #vectorsearch #aisystems #aiengineering Follow Sneha Vijaykumar for more...😊

Vaibhava Lakshmi Ravideshik

AI for Science @ GRAIL | Research Lead @ Massachussetts Institute of Technology - Kellis Lab | LinkedIn Learning Instructor | Author - “Charting the Cosmos: AI’s expedition beyond Earth” | TSI Astronaut Candidate

20,067 followers 5mo

We’ve all heard the term RAG - Retrieval-Augmented Generation - tossed around as the secret sauce behind grounded LLMs. In essence, it’s like giving an LLM a library card: before answering your question, it goes and fetches the most relevant documents (or graph facts), then uses them to reason and generate a reply. But here’s the catch - the retriever and the generator don’t really talk to each other. The retriever decides what’s relevant. The generator tries to make sense of whatever it’s given. If the retriever grabs noisy or incomplete data, the generator can’t correct it. And if the generator struggles, it can’t tell the retriever how to do better. That’s the “broken conversation” D-RAG (Differentiable Retrieval-Augmented Generation) sets out to fix. Think of the retriever as a spotlight scanning a huge knowledge graph (like Freebase or Wikidata) for the most useful facts to answer a question. Normally, that spotlight’s movements are controlled by rough heuristics - you can’t teach it through gradients because its decisions are discrete (“select this fact, skip that one”). D-RAG changes that by adding a soft switch. It uses a clever mathematical trick called Gumbel-Softmax, which lets the retriever make selections that are almost discrete but still smooth enough for gradients to flow through. This means the system can now learn end-to-end: the generator’s success or failure directly tunes how the retriever behaves next time. The retriever is powered by a Graph Neural Network that encodes not just words but the structure of the knowledge graph - who’s connected to whom, through what relationship. Then, instead of just handing over a list of triples, D-RAG builds a neural prompt - a text-plus-structure hybrid that the LLM can understand while still preserving graph context. The result? A pipeline where the retriever and generator evolve together, reducing noise, keeping the reasoning chain intact, and boosting both precision and recall in benchmarks like WebQSP and CWQ. This may sound technical, but it points toward something big: models that don’t just retrieve knowledge but learn what kind of knowledge helps reasoning. In a way, D-RAG teaches machines a subtle human skill - learning how to look things up better, based on how well you understood them last time. Imagine RAG systems that self-improve their “research habits,” or question-answering agents that adapt their retrieval strategy depending on how confident they are. That’s the frontier this paper hints at. Full length paper: https://lnkd.in/g9VHGGA9 #ArtificialIntelligence #MachineLearning #DeepLearning #NaturalLanguageProcessing #GenerativeAI #LLMs #RetrievalAugmentedGeneration #RAG #DifferentiableRAG #KnowledgeGraphs #KnowledgeGraphQA #GraphNeuralNetworks #GraphAI #NeuralRetrieval #EndToEndLearning #ReasoningSystems #AIResearch #EMNLP2025 #AIInnovation #FutureOfAI #ComputationalLinguistics #AIEvolution

2 Comments

Retrieval Augmented Generation Guide

More in Retrieval Augmented Generation Guide

More Artificial Intelligence topics

Explore categories