Tips for Improving Retrieval with Agentic Agents

Explore top LinkedIn content from expert professionals.

Summary

Agentic agents are advanced AI systems that don't just pull information but also reason through data, connecting facts and figuring out the best way to answer complex questions. Improving retrieval with these agents means designing systems that can find, manage, and use the right information at each step, making AI more reliable and trustworthy for demanding tasks.

  • Structure your memory: Build systems that separate short-term and long-term memories, using clear rules so agents keep only the most important facts and recent interactions handy.
  • Refine your queries: Rewrite questions, rerank search results, and trim down context so agents only work with the most relevant, up-to-date information.
  • Use smart tools: Combine traditional search, knowledge graphs, and APIs so the agent can explore relationships and fill in gaps, rather than guessing or hallucinating answers.
Summarized by AI based on LinkedIn member posts
  • View profile for Om Nalinde

    Building & Teaching AI Agents to Devs | CS @IIIT

    158,331 followers

    I used this guide to build 10+ AI Agents Here're my 10 actionable items: 1. Turn your agent into a note-taking machine → Dump plans, decisions, and results into state objects outside the context window → Use scratchpad files or runtime state that persists during sessions → Stop cramming everything into messages - treat state like external storage 2. Be ridiculously picky about what gets into context → Use embeddings to grab only memories that matter for current tasks → Keep simple rules files (like CLAUDE md) that always load → Filter tool descriptions with RAG so agents aren't confused by irrelevant tools 3. Build a memory system that remembers useful stuff → Create semantic, episodic, and procedural memory buckets for facts, experiences, instructions → Use knowledge graphs when embeddings fail for relationship-based retrieval → Avoid ChatGPT's mistake of pulling random location data into unrelated requests 4. Compress like your context window costs $1000 per token → Set auto-summarization at 95% context capacity with no exceptions → Trim old messages with simple heuristics: keep recent, dump middle → Post-process heavy tool outputs immediately - search results don't live forever 5. Split your agent into specialized mini-agents → Give each sub-agent one job and its own isolated context window → Hand off context with quick summaries, not full message histories → Run sub-agents in parallel when possible for isolated exploration 6. Sandbox the heavy stuff away from your LLM → Execute code in environments that isolate objects from context → Store images, files, complex data outside the context window → Only pull summary info back - full objects stay in sandbox 7. Make summarization smart, not just chronological → Train models specifically for agent context compression → Preserve critical decision points while compressing routine chatter → Use different strategies for conversations vs tool outputs 8. Prune context like you're editing a novel → Implement trained pruners that understand relevance, not just recency → Filter based on task relevance while maintaining conversational flow → Adjust pruning aggressiveness based on task complexity 9. Monitor token usage like a hawk → Track exactly where tokens burn in your agent pipeline → Set real-time alerts when context utilization hits dangerous levels → Build dashboards correlating context management with success rates 10. Test everything or admit you're just guessing → A/B test different context strategies and measure performance differences → Create evaluation frameworks testing before/after context engineering changes → Set up continuous feedback loops auto-adjusting context parameters Last but not the least, be open to new ideas and keep learning Check out 50+ AI Agent Tutorials on my profile 👋 .

  • View profile for Brij kishore Pandey
    Brij kishore Pandey Brij kishore Pandey is an Influencer

    AI Architect & Engineer | AI Strategist

    720,891 followers

    𝗥𝗔𝗚 𝗵𝗮𝘀 𝗾𝘂𝗶𝗲𝘁𝗹𝘆 𝗲𝘃𝗼𝗹𝘃𝗲𝗱 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝘁𝗵𝗿𝗲𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀. Most teams are still running Gen 1 and wondering why their chatbot hallucinates on anything beyond a FAQ. Here's what actually separates them: 𝗖𝗹𝗮𝘀𝘀𝗶𝗰 𝗥𝗔𝗚 → 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 → Query → Embed → Vector DB → Top-K chunks → LLM → Fast, simple, single-hop → Great for: "What's our refund policy?" → Breaks on: "Why did refunds spike in Q3 vs Q2?" 𝗚𝗿𝗮𝗽𝗵 𝗥𝗔𝗚 → 𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽𝘀 → Query → Entity extraction → Knowledge graph → Connected context → LLM → Entity-rich, multi-source, relational → Great for: "How is customer X connected to the fraud ring we flagged last quarter?" → Requires: investment in graph construction and entity resolution 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚 → 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 → Query → Reasoning agent → Vector DB + Knowledge graph + Tools → Self-evaluation loop → Final answer → Adaptive, multi-step, self-correcting → The agent decides what to retrieve, when to retrieve again, and whether the answer is actually good → Great for: complex investigations, research workflows, enterprise support with ambiguous intent 𝗧𝗵𝗲 𝗵𝗼𝗻𝗲𝘀𝘁 𝘁𝗮𝗸𝗲: You don't always need Agentic RAG. Classic RAG still wins on latency and cost for 70% of real use cases. But when your users start asking questions that require reasoning across disconnected facts, the retrieval pattern has to evolve with them. The mistake I keep seeing: teams jump straight to "let's add an agent" before they've fixed their chunking strategy. Walk before you run. What's blocking your team from moving beyond Classic RAG — evaluation complexity, latency budgets, or stakeholder trust in agentic systems?

  • View profile for Vignesh Kumar
    Vignesh Kumar Vignesh Kumar is an Influencer

    AI Product & Engineering | Start-up Mentor & Advisor | TEDx & Keynote Speaker | LinkedIn Top Voice ’24 | Building AI Community Pair.AI | Director - Orange Business, Cisco, VMware | Cloud - SaaS & IaaS | kumarvignesh.com

    21,037 followers

    🚀 Why RAG alone won’t get us there—and how Agentic RAG helps I've used RAG systems in multiple products—especially in knowledge-heavy contexts. They help LLMs stay grounded by retrieving supporting documents. But there’s a point where they stop being useful. Let me give you a simple example. Let’s say you ask: 👉 “Which medical researchers have published on long COVID, what clinical trials they were part of, and what other conditions those trials studied?” A classical RAG system would: 1️⃣ Look for text chunks that match “long COVID” 2️⃣ Return some papers or abstracts 3️⃣ And leave the LLM to guess or hallucinate the rest And here is the problem? You're not just looking for one passage. You're asking for a chain of connected facts: 🔹 Authors → 🔹 Publications → 🔹 Clinical trials → 🔹 Other conditions RAG systems were never built to follow that trail. They do top-k lookup and feed static chunks to the LLM. No planning. No reasoning. No ability to explore relationships between entities. That’s where Agentic RAG with Knowledge Graphs comes in. Instead of dumping search results, the system: ✅ Breaks the question into steps ✅ Uses structured data to navigate relationships (e.g., author–trial–condition) ✅ Assembles the answer using small, verifiable hops ✅ Uses tools for hybrid search, graph queries, and concept mapping You can think of it like this: A classical RAG is like searching through a pile of papers with a highlighter and Agentic RAG is like giving the job to a smart analyst who understands the question, walks through your research database, and explains how each part connects. I am attaching a paper I read recently that demonstrated this well—they used a mix of Neo4j for knowledge graphs, vector stores for retrieval, and a lightweight LLM to orchestrate the steps. The key wasn’t the model size—it was the structure and reasoning behind it. I believe that this approach is far more suitable for domains where: 💠 Information lives across connected sources 💠 You need traceability 💠 And you can’t afford vague or partial answers I see this as a practical next step for research, healthcare, compliance, and enterprise decision-support. #AI #LLM #AgenticRAG #KnowledgeGraph #productthinking #structureddata I write about #artificialintelligence | #technology | #startups | #mentoring | #leadership | #financialindependence   PS: All views are personal Vignesh Kumar

  • View profile for Himanshu J.

    Building Aligned, Safe and Secure AI

    29,467 followers

    🚀 𝗥𝗔𝗚 𝗶𝘀𝗻’𝘁 𝗷𝘂𝘀𝘁 𝗮𝗯𝗼𝘂𝘁 “𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗲 & 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲” 𝗮𝗻𝘆𝗺𝗼𝗿𝗲, 𝗶𝘁’𝘀 𝗯𝗲𝗰𝗼𝗺𝗶𝗻𝗴 𝗮 𝗱𝗲𝘀𝗶𝗴𝗻 𝗱𝗶𝘀𝗰𝗶𝗽𝗹𝗶𝗻𝗲. I just finished reviewing Weaviate’s Advanced RAG Techniques guide, and it’s one of the clearest breakdowns of what actually improves real-world RAG performance. Here are the 𝗸𝗲𝘆 𝘀𝗵𝗶𝗳𝘁𝘀 every AI builder, founder, and enterprise team should pay attention to:- 🔹 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 𝗶𝘀 𝗯𝗲𝗰𝗼𝗺𝗶𝗻𝗴 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴. Data cleaning, semantic chunking, LLM-based chunking, and OCR-free multimodal retrieval (e.g., ColPali, ColQwen) dramatically improve knowledge preparation. 🔹 𝗤𝘂𝗲𝗿𝘆 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗶𝘀 𝗲𝘃𝗼𝗹𝘃𝗶𝗻𝗴. Rewrite → Retrieve → Read is surpassing naive pipelines. Query decomposition + routing is quietly becoming core to Agentic RAG. 🔹 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 “𝘃𝗲𝗰𝘁𝗼𝗿 𝘀𝗲𝗮𝗿𝗰𝗵.” Hybrid search, metadata filtering, distance-based cutoffs, and domain-tuned embeddings improve accuracy more than increasing top-k. 🔹 𝗣𝗼𝘀𝘁-𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗶𝘀 𝘁𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝘀𝘁 𝘅-𝗳𝗮𝗰𝘁𝗼𝗿. Re-rankers, context compression, metadata-enhanced context windows, and smarter prompt engineering boost quality without touching the LLM. 🔹 𝗔𝗻𝗱 𝘆𝗲𝘀, 𝗟𝗟𝗠 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝗶𝘀 𝗯𝗮𝗰𝗸. Fine-tuning for domain specificity, terminology, and reasoning drastically improves grounding in specialized RAG systems. 🔥 𝗥𝗔𝗚 𝗶𝘀 𝘀𝗵𝗶𝗳𝘁𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 “𝗮𝗱𝗱 𝗰𝗼𝗻𝘁𝗲𝘅𝘁” 𝘁𝗼 “𝗱𝗲𝘀𝗶𝗴𝗻 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗶𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲.” #𝗥𝗔𝗚 #𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 #𝗩𝗲𝗰𝘁𝗼𝗿𝗦𝗲𝗮𝗿𝗰𝗵 #𝗔𝗜𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 #𝗔𝗴𝗲𝗻𝘁𝗶𝗰𝗔𝗜 #𝗟𝗟𝗠𝘀 #𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲𝗔𝗜 #𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲

  • View profile for Shrey Shah

    AI @ Microsoft | I teach harness engineering | Cursor Ambassador | V0 Ambassador

    16,881 followers

    When your AI sounds clever but spits nonsense, it’s not hallucinating. It’s missing the right context. Most prompts throw in random or outdated info. That’s like giving a chef spoiled ingredients and expecting a gourmet meal. Context isn’t just decoration anymore. It’s the whole recipe. Today’s best AI setups don’t just dump data in. They carefully decide what to pull, when, and how to keep the context clean. Here’s what a well-engineered context looks like: • Working memory: recent interactions and tool outputs needed right now   • Long-term memory: user habits, past results, and facts stored smartly for later   • Dynamic retrieval: smart query tweaks, ranking, and trimming before feeding the model   • Tools as partners: APIs, search engines, and servers called only when needed  Example: A coding AI keeps track of recent errors and code changes in working memory. It stores project files and dependencies in long-term memory. When stuck, it fetches docs or runs web searches. The result? Faster, more reliable code without nonsense. If you’re building AI agents, focus on: • Sharpening retrieval: rewrite queries, rerank results, compress context before the model sees it   • Splitting memories: keep short-term fresh and long-term lean with only key facts   • Using tools like sensors: call them when info is missing, don’t expect the model to guess   • Defining clear context rules: schemas for tools and outputs that the system follows  Your current retrieval-augmented setup isn’t useless. It’s the base. The trick now is orchestration — giving the model just the right slice of context to do its job well. So next time your AI messes up, don’t tweak the prompt first. Fix the context. Then watch it finally get it right. I’m Shrey Shah & I share daily guides on AI. If this helped, hit the ♻️ reshare button to help someone else build smarter AI.

  • View profile for Pablo Castro

    CVP & Distinguished Engineer at Microsoft

    8,926 followers

    We just shipped an update to Agentic Retrieval in Azure AI Search that boosts results quality when grounding agents on external data for elaborate scenarios where a single search against a single data source would not be enough. We're expanding Agentic Retrieval with the ability to target multiple indexes in a single operation. This is not just a fan-out query, the query planner uses the information it has about each source of data to intelligently decompose retrieval tasks into separate subtasks, issues the right queries to the right indexes, and then composes a unified result ready to be sent to the language model backing an agent. The system is built and evaluated from the ground up with consideration of steerability. Developers can provide descriptions and instructions for each data source, as well as overall instructions for the retrieval agent that oversees the operation. We also added the option for answer synthesis in the same retrieval call. When answer synthesis is enabled, we not only return the raw grounding information but also an actual answer to the question or task based on the various results from the knowledge sources the system selected. With this capability, a single call to the /retrieve API replaces what would have been extensive context engineering work to get routing and query planning right, several calls to a language model for query decomposition, several calls to separate search indexes to actually retrieve data, and subsequent work to stitch results together. More in Matthew Gotteiner's blog post: https://lnkd.in/ggyF42Jw

  • View profile for Paolo Perrone

    No BS AI/ML Content | ML Engineer with a Plot Twist 🥷100M+ Views 📝

    128,928 followers

    Are you struggling to build AI agents that work beyond the demo? I’ve spent the past year building and stress-testing agentic systems And what I’ve found is that most of the pain can be solved with 7 principles: 1️⃣ Structured Workflows > Clever Prompts Agents need a structured loop: reason → act → reflect → retry → escalate Loose, one-off prompts won’t sustain multi-step tasks 2️⃣ Context Handling is Core Architecture What the agent remembers — and how it recalls it — defines its range Summaries, scoped retrieval, and structured files work. Dumping full context doesn’t 3️⃣ Planning is a Must Agents need a built-in planning process to break down tasks and recover from failure Plan → execute → review is the backbone of reliable behavior 4️⃣ Real-world Agents Use Real Tools Terminal access, Git, APIs — without system interaction, it’s all talk Execution turns intent into impact 5️⃣ Reasoning Patterns Must be Enforced in the System Chain-of-Thought, ReAct — they only work when embedded in the system's logic Prompting for “step-by-step” isn’t enough on its own 6️⃣ Autonomy Needs Boundaries Without guardrails, agents can break things quickly Scoped actions, fallback logic, and safety checks are essential 7️⃣ The Magic is in Orchestration Great agents aren’t just smart — they manage memory, tools, decisions, and recovery Orchestration is what makes scaling multi-agent systems possible If you’re serious about building functional agents, these principles are non-negotiable Building better agents shouldn’t be gatekept If this helped you, pass it on 💾♻️

  • View profile for Sohrab Rahimi

    Director, AI/ML Lead @ Google

    23,610 followers

    Delibrate context engineering is what makes all the difference in an agent’s behavior and performance. The paper Context Engineering Survey, offers one of the most comprehensive analyses of this topic. It shows that models lose precision when more tokens are added, a problem described as context rot, and that attention must be treated as a limited resource. This is one of the most important papers to read for anyone working on agent design and performance optimization. The authors of the paper suggest that there are three main principles in designing effective context: 𝟭. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗮𝘀 𝗮 𝗱𝘆𝗻𝗮𝗺𝗶𝗰 𝘀𝘆𝘀𝘁𝗲𝗺. The context should evolve during the task instead of remaining static. The model’s view must update as new data, outputs, or user instructions appear. 𝟮. 𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗼𝘃𝗲𝗿 𝗾𝘂𝗮𝗻𝘁𝗶𝘁𝘆. Adding tokens does not guarantee better reasoning or consistency. The real task is to select only the most relevant and high-signal information that supports the next step. 𝟯. 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝘀𝗰𝗮𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆. Manual curation of context does not scale. Systems must automatically decide what to keep, compress, or retrieve. The paper then outlines the components of context engineering that practitioners can refine: • 𝗦𝘆𝘀𝘁𝗲𝗺 𝗽𝗿𝗼𝗺𝗽𝘁𝘀 should read like structured briefs with clear sections for background, instructions, tools, and expected outputs. • 𝗧𝗼𝗼𝗹𝘀 should serve one purpose each and return compact, clear outputs. • 𝗙𝗲𝘄-𝘀𝗵𝗼𝘁 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀 work best when limited to diverse, representative samples instead of exhaustive lists. • 𝗠𝗲𝘀𝘀𝗮𝗴𝗲 𝗵𝗶𝘀𝘁𝗼𝗿𝘆 should be summarized to retain decisions and remove redundant exchanges. • 𝗝𝘂𝘀𝘁-𝗶𝗻-𝘁𝗶𝗺𝗲 𝗱𝗮𝘁𝗮 should replace preloading entire datasets, allowing agents to fetch what they need at the right moment. In benchmarks such as the Needle in a Haystack test, 𝗰𝘂𝗿𝗮𝘁𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝘄𝗶𝗻𝗱𝗼𝘄𝘀 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗱 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗯𝘆 𝘂𝗽 𝘁𝗼 𝟯𝟬 𝗽𝗲𝗿𝗰𝗲𝗻𝘁 𝗮𝗻𝗱 𝗿𝗲𝗱𝘂𝗰𝗲𝗱 𝗹𝗮𝘁𝗲𝗻𝗰𝘆 𝗯𝘆 𝗻𝗲𝗮𝗿𝗹𝘆 𝗵𝗮𝗹𝗳. Anthropic’s internal trials also showed that smaller, purpose-built contexts allowed agents to recover information faster and act with greater reliability on long tasks. These findings confirm that performance is determined by the quality of tokens rather than their volume. Next iteration of agentic platforms will have to include automated context managers, integrated long-term memory, and standard APIs for tool and data access. These capabilities will define how scalable and sustainable next-generation systems become. Building them is complex and time intensive, involving coordination across model design, data retrieval, and infrastructure layers. Yet this is what will allow AI systems to reason over time, learn from experience, and remain consistent across tasks. Context engineering will be the backbone of how intelligent agents think, remember, and adapt.

  • View profile for Ravena O

    AI Researcher and Data Leader | Healthcare Data | GenAI | Driving Business Growth | Data Science Consultant | Data Strategy

    92,474 followers

    RAG was introduced with a clear promise: to make LLMs smarter, give them memory, and anchor their responses in verifiable facts. But there’s an important reality that often goes unspoken. Most RAG implementations function like enhanced search engines. They retrieve documents, insert them into context, and expect the LLM to make sense of it. That isn’t true intelligence — it’s structured copy-and-paste. What’s emerging now represents a meaningful shift: agents are beginning to manage the entire retrieval and reasoning workflow. Platforms like Glean, Perplexity, and Harvey are not simply retrieving documents. They are reasoning before retrieval, after retrieval, and at times choosing not to retrieve at all. This changes the entire paradigm: 🔴 Instead of embedding every query by default, an agent evaluates: “What information is actually required here?” 🔴 Instead of flooding context with irrelevant chunks, it determines: “Which sources matter for this specific question?” 🔴 Instead of generating a single response, it reflects: “Did this answer fully address the user’s intent?” 🔴 Memory becomes functional: short-term for the current interaction, long-term for patterns across sessions. 🔴 A broader toolset becomes accessible: search engines, APIs, databases — the agent selects the right tool for the task. 🔴 The LLM stops being an isolated generator and becomes an integral component in a coordinated reasoning system. This is Agentic RAG. Not an incremental improvement in retrieval — but a fundamentally different architecture for enterprise intelligence. And once you see it working inside real, complex workflows, traditional RAG begins to feel… noticeably incomplete. CC: Om Nalinde

  • Everyone claims "context is king," but Dropbox admits the messy reality: adding more tools to your agent actually makes it stupider. Most engineers think "Agentic AI" means giving an LLM access to every API in the stack. Dropbox found the opposite: giving agents more tools caused "analysis paralysis." To fix this "context rot," they implemented three architectural shifts. First, they collapsed dozens of granular retrieval APIs (Jira, Slack, GDocs) into a single "Universal Search" tool, drastically reducing the prompt's schema load. Second, they shifted relevance computation upstream, using a pre-built Knowledge Graph to rank and prune content before it ever touches the context window. Finally, they decoupled logic by offloading complex query construction to a specialized "Search Agent," leaving the main "Planner Agent" free to focus on orchestration. This proves that the bottleneck for production agents isn't model capability; it's the signal-to-noise ratio of the input. We don't need larger context windows; we need better context engineering. More information in the blog post: 🔗https://lnkd.in/egy4jTF4

Explore categories