How to Improve RAG Retrieval Methods

Explore top LinkedIn content from expert professionals.

Summary

Retrieval-Augmented Generation (RAG) is an AI technique where a language model retrieves relevant documents and uses them to produce more accurate and grounded responses. Recent discussions focus on advanced ways to improve how RAG systems find and use information, making them more reliable and suitable for real-world applications.

  • Refine retrieval paths: Combine multiple search approaches—like vector, graph, and SQL—so your system can handle different types of questions and data structures.
  • Upgrade indexing and reranking: Use specialized models and rerankers to pinpoint the most relevant information and ensure your retrieval engine returns meaningful results every time.
  • Fine-tune models together: Train both your language model and your retriever side-by-side so they learn to work as a team, leading to more accurate answers without requiring massive amounts of extra data.
Summarized by AI based on LinkedIn member posts
  • View profile for Brij kishore Pandey
    Brij kishore Pandey Brij kishore Pandey is an Influencer

    AI Architect & Engineer | AI Strategist

    720,785 followers

    Stop building RAG like it's 2023. We all know the basic recipe: Chunk → Embed → Retrieve → Generate. It works great… until it doesn't. The moment you go from weekend prototype to enterprise production, that simple pipeline falls apart. I mapped out what a truly Robust RAG System actually looks like under the hood. Here's what most teams are missing: ━━━━━━━━━━━━━━━━━━━━━━━ 𝟭. 𝗤𝘂𝗲𝗿𝘆 𝗖𝗼𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻 ≠ 𝗝𝘂𝘀𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗦𝗲𝗮𝗿𝗰𝗵 Real queries need multiple backends: ↳ Graph DBs for relationship-heavy questions ↳ SQL for structured/numerical data ↳ Vector search for semantic meaning One retrieval path can't handle all three. 𝟮. 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗥𝗼𝘂𝘁𝗶𝗻𝗴 Before you even retrieve, you need to decide: ↳ Semantic route or logical route? ↳ Single-hop or multi-hop? ↳ Which data source to hit first? This one decision layer saves you from 80% of bad retrievals. 𝟯. 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 If you're still doing naive chunking, you're leaving accuracy on the table. ↳ RAPTOR → recursive abstractive processing for hierarchical understanding ↳ ColBERT → token-level semantic matching for precision retrieval ↳ Multi-representation indexing → different views of the same data 𝟰. 𝗧𝗵𝗲 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗟𝗼𝗼𝗽 (𝗡𝗼𝗻-𝗡𝗲𝗴𝗼𝘁𝗶𝗮𝗯𝗹𝗲) You can't improve what you can't measure. ↳ Ragas for end-to-end RAG evaluation ↳ DeepEval for component-level testing ↳ Continuous monitoring, not one-time benchmarks ━━━━━━━━━━━━━━━━━━━━━━━ Here's the hard truth: RAG isn't a feature anymore. It's a full engineering system. And the teams treating it like a quick integration are the ones wondering why their AI "hallucinates." The gap between a demo and production RAG? It's these 4 layers.

  • View profile for Andreas Kretz
    Andreas Kretz Andreas Kretz is an Influencer

    I teach Data Engineering and create data & AI content | 10+ years of experience | 3x LinkedIn Top Voice | 230k+ YouTube subscribers

    157,786 followers

    I thought my RAG project was solid until I saw how random the results really were...   When I first released my new RAG project in the Learn Data Engineering Academy, I was pretty happy with it. It ran end-to-end, gave answers, looked smart.   But after testing it more, I realized something was off. The retrieval felt random. Sometimes we’d get exactly the right document, other times, something completely irrelevant.   And once I saw it, I couldn’t unsee it.   So I spent the weekend digging into what was going on and found two major mistakes and two ways to fix them.   Those fixes completely changed the project’s behavior. Now, retrieval isn’t luck anymore, it’s reliable.   Here’s what I fixed after release:   ➡️ Switched to a proper embedding model (BGE) instead of using general-purpose ones ➡️ Normalized embeddings to make similarity scores meaningful ➡️ Configured Elasticsearch for cosine similarity ➡️ Added a cross-encoder reranker to detect truly relevant chunks   It was a great reminder: even in GenAI, Data Engineering fundamentals make all the difference. Retrieval quality doesn’t come from prompts. It comes from architecture, indexing, and evaluation.   If you want to build a practical local RAG system with Elasticsearch, LlamaIndex, Ollama (Mistral), and understand what really makes it perform well, this project walks you through everything step by step. 👉 Check it out via the link in the comments!   And if you’d like to see how I fixed it in detail, I recorded a livestream where I walk through the debugging process, show before/after examples, and explain the improvements. 🎥 Watch the recording via the link in the comments!

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    628,006 followers

    If you’re an AI engineer working on RAG, or building advanced retrieval-augmented systems, you need to know about RAFT: Retrieval-Augmented Fine-Tuning. Let’s break it down 👇 → Closed-Book Models (SFT Only) The model learns everything at train time, and answers based purely on its internal weights. Fast, but brittle – hallucinations spike when the model faces unfamiliar queries. → Open-Book Models (Standard RAG) At inference time, the model retrieves top-k documents and answers using them as context. But the model has never seen these docs during training – so it treats relevant and irrelevant documents the same way, often leading to noisy outputs. → RAFT: Retrieval + Fine-Tuning Combined RAFT, proposed by UC Berkley, merges RAG and fine-tuning. During training, the model is explicitly taught how to use retrieved documents – rewarding it for grounding answers in the right document and ignoring distractors. Here’s how RAFT works: → Use a query → Pair it with a golden doc (the correct reference) → Add sampled negative docs (distractors) → Train the model to generate an answer that quotes only from the golden doc This makes the model retrieval-aware during generation – it learns to differentiate between helpful and irrelevant documents. Why RAFT matters 🤔 → Reduces hallucinations by grounding answers in relevant context → Boosts accuracy in domain-specific applications like legal, medical, scientific QA → Works with smaller open-weight models like LLaMA 2 and Mistral 7B → Outperforms vanilla RAG on benchmarks like HotpotQA and PubMedQA How to train with RAFT 🛠️ → Build training triples: (query, golden doc, distractor docs) → Use your existing retrieval setup and corpus → Fine-tune using LoRA or full SFT with these inputs → At inference, continue to use top-k retrieval – the model will now handle noise better When to use RAFT ⁉️ → When your application requires faithfulness and traceability (e.g., legal, healthcare) → When your retrieval corpus includes overlapping or ambiguous docs → When you want smaller models to reason better with external documents RAFT doesn’t replace retrieval – it enhances it by teaching the model how to reason over retrieved content. Instead of hoping your model figures it out at runtime, RAFT prepares it during training. If you’re working on GenAI systems or retrieval pipelines, this is one method you can’t afford to ignore. Arvind and I are doing a free RAG lightning session on 4th April. If you want to learn more about RAG, do join us: https://lnkd.in/gHFmmfR2

  • View profile for Ravit Jain
    Ravit Jain Ravit Jain is an Influencer

    Founder & Host of "The Ravit Show" | Influencer & Creator | LinkedIn Top Voice | Startups Advisor | Gartner Ambassador | Data & AI Community Builder | Influencer Marketing B2B | Marketing & Media | (Mumbai/San Francisco)

    169,179 followers

    RAG just got smarter. If you’ve been working with Retrieval-Augmented Generation (RAG), you probably know the basic setup: An LLM retrieves documents based on a query and uses them to generate better, grounded responses. But as use cases get more complex, we need more advanced retrieval strategies—and that’s where these four techniques come in: Self-Query Retriever Instead of relying on static prompts, the model creates its own structured query based on metadata. Let’s say a user asks: “What are the reviews with a score greater than 7 that say bad things about the movie?” This technique breaks that down into query + filter logic, letting the model interact directly with structured data (like Chroma DB) using the right filters. Parent Document Retriever Here, retrieval happens in two stages: 1. Identify the most relevant chunks 2. Pull in their parent documents for full context This ensures you don’t lose meaning just because information was split across small segments. Contextual Compression Retriever (Reranker) Sometimes the top retrieved documents are… close, but not quite right. This reranker pulls the top K (say 4) documents, then uses a transformer + reranker (like Cohere) to compress and re-rank the results based on both query and context—keeping only the most relevant bits. Multi-Vector Retrieval Architecture Instead of matching a single vector per document, this method breaks both queries and documents into multiple token-level vectors using models like ColBERT. The retrieval happens across all vectors—giving you higher recall and more precise results for dense, knowledge-rich tasks. These aren’t just fancy tricks. They solve real-world problems like: • “My agent’s answer missed part of the doc.” • “Why is the model returning irrelevant data?” • “How can I ground this LLM more effectively in enterprise knowledge?” As RAG continues to scale, these kinds of techniques are becoming foundational. So if you’re building search-heavy or knowledge-aware AI systems, it’s time to level up beyond basic retrieval. Which of these approaches are you most excited to experiment with? #ai #agents #rag #theravitshow

  • View profile for Zain Hasan

    I build and teach AI | AI/ML @ Together AI | EngSci ℕΨ/PhD @ UofT | Previously: Vector DBs, Data Scientist, Lecturer & Health Tech Founder | 🇺🇸🇨🇦🇵🇰

    19,611 followers

    Can we finetune our LLM and retriever together to improve RAG performance? This paper proposes a technique to do exactly that! RAG Basics: When you prompt an LLM, RAG supplies relevant documents. A separate retrieval model computes the probability of each text chunk being relevant and provides the top chunks to the LLM. The LLM generates tokens based on the chunks, prompt, and previous tokens. In Short: Fine-tuning LLMs and retrieval models together improves performance without extensive data processing, enabling better retrieval-augmented generation. LLMs aren't exposed to retrieval-augmented inputs during pretraining, limiting their ability to use retrieved text effectively. Fine-tuning the LLM and retrieval model together can improve performance without requiring extensive data processing. How it Works: Authors from Meta fine-tuned Llama 2 (65B parameters) and DRAGON+, a retriever, to create RA-DIT 65B. They fine-tuned Llama 2 on prompts with retrieved text and questions, and fine-tuned DRAGON+ to retrieve more relevant chunks. Fine-tuning was supervised for tasks like question-answering and self-supervised for text chunk completion. Results: RA-DIT 65B achieved 49.1% accuracy on average across four question datasets, outperforming LLaMA 2 65B with DRAGON+ (45.1%) and LLaMA 2 65B alone (32.9%). With five example inputs, RA-DIT 65B reached 51.8% accuracy. RA-DIT offers an efficient way to enhance LLM performance with RAG, making it a valuable technique for developers. Details: RA-DIT fine-tunes Llama 2 and DRAGON+ to work together effectively, leveraging the strengths of both models to generate better output. By fine-tuning the LLM to better use retrieved knowledge and the retrieval model to select more relevant text, RA-DIT achieves improved performance without requiring extensive data processing. https://lnkd.in/gf4fGVkC

  • View profile for Santiago Valdarrama

    Computer scientist and writer. I teach hard-core Machine Learning at ml.school.

    121,955 followers

    This makes your RAG application 10x better. Most people I know split their documents and generate embeddings for those chunks. But generating good chunks is hard. There's no perfect solution, but there's a simple trick to make those chunks much better. Augment each chunk with additional metadata. For example, say you're chunking research papers. Each chunk might be just a paragraph, but that paragraph by itself is often too vague. Instead of using the paragraph alone, I add the following information to each chunk: • The paper title • The page number • The section heading where the paragraph is • Any relevant keywords or tags in that paragraph • A one-sentence summary of the paragraph This extra context makes the embedding richer and way more useful at retrieval time. You can either infer this additional metadata or use an LLM to generate it. This is an extra step. Don't worry about it if you are just starting with your RAG implementation, but as soon as you have a working solution, spend the time building this. You'll never go back.

  • View profile for Sohrab Rahimi

    Director, AI/ML Lead @ Google

    23,608 followers

    Many companies have started experimenting with simple RAG systems, probably as their first use case, to test the effectiveness of generative AI in extracting knowledge from unstructured data like PDFs, text files, and PowerPoint files. If you've used basic RAG architectures with tools like LlamaIndex or LangChain, you might have already encountered three key problems: 𝟭. 𝗜𝗻𝗮𝗱𝗲𝗾𝘂𝗮𝘁𝗲 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗠𝗲𝘁𝗿𝗶𝗰𝘀: Existing metrics fail to catch subtle errors like unsupported claims or hallucinations, making it hard to accurately assess and enhance system performance. 𝟮. 𝗗𝗶𝗳𝗳𝗶𝗰𝘂𝗹𝘁𝘆 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀: Standard RAG methods often struggle to find and combine information from multiple sources effectively, leading to slower responses and less relevant results. 𝟯. 𝗦𝘁𝗿𝘂𝗴𝗴𝗹𝗶𝗻𝗴 𝘁𝗼 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗮𝗻𝗱 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀: Basic RAG approaches often miss the deeper relationships between information pieces, resulting in incomplete or inaccurate answers that don't fully meet user needs. In this post I will introduce three useful papers to address these gaps: 𝟭. 𝗥𝗔𝗚𝗖𝗵𝗲𝗸𝗲𝗿: introduces a new framework for evaluating RAG systems with a focus on fine-grained, claim-level metrics. It proposes a comprehensive set of metrics: claim-level precision, recall, and F1 score to measure the correctness and completeness of responses; claim recall and context precision to evaluate the effectiveness of the retriever; and faithfulness, noise sensitivity, hallucination rate, self-knowledge reliance, and context utilization to diagnose the generator's performance. Consider using these metrics to help identify errors, enhance accuracy, and reduce hallucinations in generated outputs. 𝟮. 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗥𝗔𝗚: It uses a labeler and filter mechanism to identify and retain only the most relevant parts of retrieved information, reducing the need for repeated large language model calls. This iterative approach refines search queries efficiently, lowering latency and costs while maintaining high accuracy for complex, multi-hop questions. 𝟯. 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚: By leveraging structured data from knowledge graphs, GraphRAG methods enhance the retrieval process, capturing complex relationships and dependencies between entities that traditional text-based retrieval methods often miss. This approach enables the generation of more precise and context-aware content, making it particularly valuable for applications in domains that require a deep understanding of interconnected data, such as scientific research, legal documentation, and complex question answering. For example, in tasks such as query-focused summarization, GraphRAG demonstrates substantial gains by effectively leveraging graph structures to capture local and global relationships within documents. It's encouraging to see how quickly gaps are identified and improvements are made in the GenAI world.

  • View profile for Aishwarya Naresh Reganti

    Founder & CEO @ LevelUp Labs | Ex-AWS | Consulting, Training & Investing in AI

    123,791 followers

    🤔A big issue with standard RAG retrieval is that it doesn't fully understand relationships in the data. What if we treated the retrieval system like an LLM and used prompts to guide it? While this might not always cause problems, it can lead to suboptimal results with queries that require some form of relation understanding for retrieval, say "Name all sci-fi authors who wrote books before 2010,". A semantic retriever will struggle with reasoning based on dates. In such cases, graph-based RAG could help, but this paper- "Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models", proposes a different solution to tackle the issue. 👉 The idea is to train retrieval models that follow natural language instructions, much like language models. 👉 Instead of just relying on semantic similarity, the pipeline adds query-specific prefixes to guide the retrieval model, ensuring it returns the most relevant responses based on the given instructions. 👉 It's specifically trained on a new instruction-based dataset, the model can handle complex reasoning, improving performance in scenarios where traditional retrieval struggles, like temporal constraints or specific relevance criteria. It seems like a solid idea, but it does make the pipeline more expensive compared to just using embeddings. That said, smaller, specifically trained retriever models could be a smart way to test this approach. Definitely worth exploring! Link: https://lnkd.in/eWNP26s6

  • View profile for Jaimin Shah

    Software Engineer II - GenAI @ Bank of America | ex ML Engineer @ LLE | Fine-Tuned LLMs · RAG Pipelines · Multi-Agent Systems

    6,591 followers

    ❌ Stop Expecting Retrieval to Work Without Cleaning Your Data → Garbage in = hallucinations out. ❌ Stop Ignoring Metadata in Retrieval → A little filtering goes a long way when you're juggling 100s of files. ❌ Stop Acting Like Tables, Images and Equations Don’t Matter → Your model won’t “just get it” if you drop structured data as flat text. It’s time we talk about the most common—and most mishandled—problems RAG pipelines: 🔥 1. Convert PDFs to Markdown (Yes, Really) If you're not doing supervised fine-tuning, Markdown is your best friend. It preserves structure, context, and traceability. Tools I swear by: Marker by DataLab — clean markdown with metadata Docling (via LangChain) — especially solid with tabular data Nougat by Meta — OCR + LaTeX + image-aware, great for scientific PDFs 💡 Pro tip: No GPU? Use Mistral OCR — fast, efficient, and impressively accurate. 🧠 2. Handling Images in PDFs Images ≠ noise. In reports, research, or medical docs, they often carry the context. Two smart options: Convert to image embeddings (when visual layout matters) Or do what I do: run a multimodal model to generate textual descriptions and enrich your chunks with image context ✂️ 3. Stop Using Arbitrary Chunk Sizes If you're still using chunk_size=1000, chunk_overlap=100—you're leaving performance on the table. ✅ Go Semantic + Hierarchical: Break parent docs into paragraphs Group semantically similar paras into mini-chunks Map each mini-chunk back to its parent using something like ParentDocumentRetriever It’s smarter. Cleaner. Way more context-aware. 🧠 4. Smarter Retrieval Starts with Smarter Queries: i) Use chat history to understand and rewrite the query—replace vague prepositions, inject clarity, and give ambiguous terms proper names. ii) Use an LLM to reformulate the query: Generate 4–5 follow-up or sub-questions Use the answers to those to reason better and form a stronger, more accurate final response Let your retriever think, not just fetch. 📌 5. Accurate Referencing Builds Trust Citations aren’t optional—they’re essential. Markdown headers help, but if your PDF is scanned or messy, they often get lost. Here's what I do: Run a 7B model to extract the main topic or section name from each chunk Use this as the source label during generation Clean, readable, and traceable. Exactly what you want in a production-grade chatbot. ⚡ RAG is not about gluing together a retriever and a generator. It's about: ✅ Understanding your data ✅ Structuring it semantically ✅ Retrieving wisely ✅ Citing clearly If you're doing that—now you're building RAG right. What’s the biggest challenge you’ve hit while working on a RAG system? Let’s trade notes ↓

  • View profile for Umair Ahmad

    Senior Data & Technology Leader | Omni-Retail Commerce Architect | Digital Transformation & Growth Strategist | Leading High-Performance Teams, Driving Impact

    11,161 followers

    Most teams think RAG is solved. It’s not. What if the real breakthrough is not bigger models… But smarter retrieval? 𝐇𝐞𝐫𝐞 𝐚𝐫𝐞 12 𝐚𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐑𝐀𝐆 𝐭𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 𝐫𝐞𝐬𝐡𝐚𝐩𝐢𝐧𝐠 𝐡𝐨𝐰 𝐀𝐈 𝐫𝐞𝐚𝐬𝐨𝐧𝐬, 𝐯𝐞𝐫𝐢𝐟𝐢𝐞𝐬, 𝐚𝐧𝐝 𝐬𝐜𝐚𝐥𝐞𝐬. → Mindscape Aware RAG • Builds a high level summary before retrieval • Connects scattered evidence like a human reader → Bidirectional RAG • Writes verified answers back into the corpus • Expands knowledge safely without hallucination drift → Graph O1 • Agent based GraphRAG with MCTS and reinforcement learning • Reasons efficiently over large graphs within context limits → QuCo RAG • Triggers retrieval using pretraining statistics • Detects rare or suspicious entities early → MegaRAG • Uses multimodal knowledge graphs for long documents • Enables global reasoning across text and images → Hybrid RAG for Multilingual QA • Handles noisy historical and OCR heavy documents • Grounds answers despite language drift → Multi Step RAG with Hypergraph Memory • Stores facts as structured hypergraphs • Supports deep multi step reasoning → TV RAG • Time aware retrieval for long videos • Aligns visuals audio and subtitles → SignRAG • Zero shot road sign recognition • Combines vision with retrieval → HiFi RAG • Multi stage document filtering • Reduces noise before generation → AffordanceRAG • Multimodal RAG for robotics • Selects actions grounded in physical reality → RAGPart and RAGMask • Lightweight protection against corpus poisoning • Defends systems without changing the LLM RAG is no longer just retrieval. It is reasoning architecture. Follow Umair Ahmad for more insights

Explore categories