Context Is Everywhere. Relevance Is Not.
Everyone agrees context is the central problem of AI and Agentic systems. Nobody agrees on what context actually is.
RAG pipelines. Semantic layers. Knowledge graphs. Ontologies. These are serious investments in a serious problem. Gartner declared in May 2025 that context engineering had replaced prompt engineering as the field’s primary design discipline. A 2025 survey covering more than 1,400 research papers formally defined context engineering as a new discipline.
But the word context is being used without a shared definition. Vendors with a semantic layer are repositioning them as "context" layers. They are not.
Retrieval Is Not Context
Most AI context layers are retrieval layers in disguise. They find what is semantically similar to the query. They fill the context window with topically related information. They optimize for “what is this about?” and hope it produces “what matters for this decision?”
Retrieval expands. Context eliminates. Conflating them is a mistake.
Semantic similarity is not relevance. A doctor asks about a patient’s chest pain. The retrieval system surfaces cardiology textbooks, clinical trial summaries, and differential diagnosis lists. All topically related. All retrieved based on embedding distance.
What the doctor actually needs is the patient’s medication list. One of those medications has a known cardiac side effect. That fact is not semantically similar to chest pain. No vector search will rank it above the textbook. But it is the single most material fact for this decision.
This is not an edge case. It is the norm.
The volume problem compounds it. A 2024 Stanford study established that LLM performance follows a U-shaped curve across context position — accuracy drops by more than 30% when relevant information sits in the middle of a long context. A 2025 study tested 18 frontier models including GPT-4.1 and Claude Opus 4 and found that every single one degrades as context length grows. Researchers now call this context rot. More information is not better context. More is often worse.
Relevance is more important than volume. Similarity retrieves candidates. Context should eliminate what does not materially matter.
What Other Disciplines Already Know
The AI community is treating context as a new problem. It is not. Other disciplines have been working on it for decades — some for over a century — and the lessons are being almost entirely ignored.
Library science separated topical similarity from situated relevance a century ago. Ranganathan’s five laws, published in 1931, describe a perfect context engine: every reader their book, every book its reader, save the time of the reader. A librarian’s job was never to find everything related to a topic. It was to find the right thing for this person in this situation. A 2026 paper in Library Hi Tech News now examines these laws in direct relation to AI; the connection is no longer analogy. It is a live research question.
Clinical medicine built an entire infrastructure around context management. A medical record is a context architecture designed to give the next clinician enough surrounding information to make a sound decision. When a patient allergy is not surfaced or records do not transfer, the failure is a context failure. The same pattern appears in AI systems every day.
Legal reasoning operates almost entirely on context. The legal standard of totality of the circumstances is a formal framework for saying: context matters, and here is how we weigh it. Case law is a context relevance system built on precedent, situational interpretation, and constraint — not similarity.
Cognitive science, through the work of Suchman and Hutchins, showed that cognition is distributed across people, tools, and environments — and is always responsive to the evolving situation, not just a fixed plan. This is the theoretical argument for why static retrieval-augmented generation is architecturally insufficient. Context is dynamic and co-constructed through interaction, not retrieved from a database.
The common lesson across all of them: context is the process of selecting what is relevant. And context is created from:
What is context?
Here is my operational definition of Context:
Context is the situation or conditions that constrain both what information is relevant and how it should be interpreted to support effective decisions and actions.
Constrain acknowledges that context works by elimination, not by accumulation. It makes you more effective by ruling out what does not apply.
Should be interpreted introduces directionality. There are more and less effective interpretations, and context is what separates them.
Decisions and actions grounds context in outcomes. Context that does not connect to the end goal is not context. It is noise with good metadata.
Constraining information and interpretation based on situational understanding is what leads to better outcomes.
How to Architect True Context Layers
Based on the definition a context layer should include.
Layer 1: Information
Data, documents, records, signals. This is where most current investment sits — RAG, knowledge graphs, vector databases. It is necessary but not sufficient. The research is advancing fast here: Graph RAG implementations now outperform vector RAG by 90+ percentage points on schema-bound queries. The information layer is maturing. But it is still the information layer.
Layer 2: Situation
The dynamic structure of what is happening and why. Most current systems address the semantic dimension of this layer — they map relationships, resolve terminology, link concepts across silos. That is real and valuable progress.
What they do not address is causation. Semantics tells you what goes with what. Causation tells you what drives what. An associative knowledge graph answers “what is related to this?” A causal graph answers “what happens if this changes?” That is the difference between a reference book and an advisor — and the layer that answers the question organizations actually care about: if something changes tomorrow, what breaks?
Layer 3: Interpretation
The process by which information becomes a basis for judgment. For humans, this includes expertise, cognitive state, goals, and bias — the same facts land differently depending on who receives them and under what conditions. For agents, the analogous process is reasoning over context to form a basis for action, and the relevant failure is not misreading but acting inconsistently with what was correctly understood.
The Materiality Filter
Everything above points to a missing mechanism.
We have systems that can retrieve information — RAG pipelines that find what is semantically similar to a query. We have systems that can relate information — knowledge graphs that map entities, concepts, and associations across domains. We have systems that can define information — ontologies that formalize what things are and how they connect.
What we do not have is systems that can reliably decide what matters.
Without a mechanism to distinguish what changes a decision from what is merely related to it, context systems degrade into high-volume retrieval systems. They expand the information surface area without improving the quality of decisions and actions. More information does not reduce uncertainty. It increases the probability of focusing on the wrong thing.
The materiality filter is the mechanism that closes that gap, on two levels
Filtering information
For each piece of information connected to a decision, ask a single question: if this fact were different, would the decision change? Facts that pass that test are material. Facts that fail are deprioritized regardless of how topically related, semantically similar, or well-tagged they are. This is the legal concept of materiality applied to context selection.
The filter requires causal structure to work. You cannot ask "would this change the decision?" without a model of what drives the decision in the first place. That is what causal edges provide. They connect facts and conditions to outcomes, and answer the question: what drives what? Once those edges exist, materiality is not a judgment call. It is a traversal.
The causal edge types that matter most: causes and caused by for direct attribution; enables and requires for dependency chains; prevents and blocked by for constraints; amplifies and dampens for feedback dynamics; conflicts with for trade-offs.
Adding these to an existing knowledge graph does not require replacing it. It extends it. Most enterprise graphs use associative relationships, related to, tagged with, mentioned in. These tell you what goes with what. Causal edges tell you what drives what. The difference between a reference book and an advisor.
Filtering actions
The same logic applies on the execution side. An agent operating in an enterprise system faces not just the question of what to know, but what to do — and what it is permitted to do. Without a mechanism to filter the action space, the agent reasons over everything it could attempt rather than everything it should. The result is the same failure mode as unfiltered information: more surface area, more noise, worse outcomes.
Recommended by LinkedIn
Recent research on bounded autonomy architectures confirms that agents operating under typed action contracts and permission-filtered constraints outperform unconstrained agents — completing more tasks with zero unsafe executions. Removing governance constraints made the system less useful, not more. The reason is the same as for the materiality filter: structured elimination outperforms unstructured freedom.
This is where governance and policy edges do the work that causal edges cannot. They answer a different question: what is permitted, under what conditions, for whom? They connect actions to authorization rules, approval requirements, scope constraints, and escalation triggers. A governance edge specifies that a particular action is available to a particular role. A policy edge specifies that above a certain consequence threshold, human approval is required before execution proceeds.
Together, causal edges and governance edges make the materiality filter complete. Causal edges filter what enters the context window. Governance edges filter what the agent is allowed to attempt. Both work by elimination — narrowing the surface to what materially matters rather than expanding it to everything that might be relevant.
The deployment implication is narrower than it may sound. The capability exists in many knowledge graph technologies today; it is a matter of extending edge types already in production. The gap is not technical. It is conceptual: the field has not separated retrieval from context, or information filtering from action filtering. Once it does, the materiality filter is an architecture decision, not a research project.
Closing the Interpretation Layer
The materiality filter addresses what context the system receives and what actions it is permitted to take. That leaves the third layer interpretation only partially resolved.
For humans, the interpretation failure is well-documented and examined in depth in AI Risk Is Not a Model Problem. The short version: the brain defaults to correlation for biological reasons. When an AI system presents pattern-matching as confident situational understanding, it lands inside a cognitive environment already predisposed to accept it. The two defaults reinforce each other, and causal failure becomes invisible at exactly the moment it matters most.
The materiality filter helps here too. By narrowing the information surface area and making causal relationships explicit, it reduces the cognitive load on whoever receives the context. Fewer candidates. Clearer dependencies. Less room for the wrong thing to look like the right thing.
For agents, the failure is structurally different. Recent research on bounded autonomy confirms that governance constraints improve agent performance not just safety. But the deeper question of how correct reasoning translates into consistent execution across complex multi-step tasks remains active in the research literature. The materiality filter is the prior problem. It determines what enters the reasoning process and what actions are available. How the agent then executes reliably across the steps of a complex task is the harder problem that follows.
What this means for practitioners
The three layers and the materiality filter are not a future architecture. They describe the gap between what most current systems do and what context engineering actually requires.
Layer 1: Information
This is where most investment sits, and it is maturing. The question is whether it is being built as a retrieval system or a context system. The difference is whether causal and governance edges are part of the design.
Layer 2: Situation
This is where the highest-leverage work is available right now. Causal edges on existing knowledge graphs. Governance edges that make the action space governable. Neither requires rebuilding what is already in production.
Layer 3: Interpretation
This is where the longest-horizon opportunity lives, for both humans and agents. The human side has frameworks. The agent side has early evidence. Both need more.
The field has the vocabulary. It has the technology. What it has not done is separate retrieval from context, or information filtering from action filtering. That separation is not a research project. It is a design decision. And it is available now.
Retrieval expands. Context eliminates. That distinction is the foundation for successful AI Agents.
I Welcome Your Feedback
If you’re working on AI, data, or agent architectures,
Appendix: Supporting Research
Context Engineering — AI Research
Mei et al. (July 2025) — A Survey of Context Engineering for Large Language Models — Survey of 1,400+ papers. Formally defines context engineering as systematic optimisation of information payloads for LLMs.
Hua et al. (October 2025) — Context Engineering 2.0: The Context of Context Engineering — Situates context engineering within its historical and conceptual landscape from the early 1990s to present.
Zhang et al. (October 2025, revised March 2026) — Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models — Introduces ACE framework treating contexts as evolving playbooks with generation, reflection, and curation.
Anthropic Engineering (2025) — Effective Context Engineering for AI Agents — Practitioner-level treatment of context engineering in production agent systems.
Context Window & Performance Research
Liu et al. (Stanford / TACL 2024) — Lost in the Middle: How Language Models Use Long Contexts — Establishes the U-shaped performance curve across context position. 30%+ accuracy drop for information in the middle of long contexts.
Morph / Chroma (2025) — Context Rot: Why LLMs Degrade as Context Grows — Tests 18 frontier models including GPT-4.1 and Claude Opus 4. Every model degrades at every context length increment tested.
RAG & Knowledge Graph Research
RAGFlow (2025) — From RAG to Context: A 2025 Year-End Review — Comprehensive review of how the field moved from retrieval pipelines to context-centric architectures.
Towards Data Science (2025) — Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI — FalkorDB benchmark: Vector RAG scores near 0% on schema-bound queries where Graph RAG achieves 90%+.
Frontiers in AI (2025) — Advancing Engineering Research through Context-Aware and Knowledge Graph–Based RAG — Peer-reviewed analysis of context-aware retrieval in knowledge-intensive domains.
Agent Execution Research
Lau et al. (April 2026, Handshake AI) — BankerToolBench: Evaluating AI Agents in Investment Banking Workflows — Benchmark of 100 real investment banking workflows across 10 task categories. Best model fails ~50% of rubric criteria; 0% of outputs rated client-ready by practitioners.
Gu et al. (April 2026, Google DeepMind) — The Illusion of Stochasticity in LLMs — Demonstrates the knowing–doing gap: models can articulate correct reasoning but fail to execute consistently. Execution failures are a structural property of inference-time sampling, not random noise.
Sohail & Haider (April 2026) — Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution — Presents a bounded-autonomy architecture in which agents operate under typed action contracts and permission-filtered capability exposure. In a 25-scenario deployment, bounded agents completed 23/25 tasks with zero unsafe executions; unconstrained agents completed 17/25.
Cross-Disciplinary Foundations
Emerald Publishing (2026) — Artificial Intelligence and the Five Laws: A New Vision for Library Science — Examines Ranganathan’s 1931 five laws in direct relation to AI context and retrieval systems.
Cornell Law / Legal Information Institute — Totality of the Circumstances — Authoritative definition of the legal standard: decisions based on all available information rather than any single factor or bright-line rule. The formal legal framework for contextual interpretation.
Hutchins, E. (1995) — Distributed Cognition — Foundational framework: cognition is distributed across people, tools, and environments. Core theoretical grounding for the Interpretation layer.
Suchman, L. (1987) — Plans and Situated Actions — Establishes that intelligent action is dynamically adapted based on interaction with the physical and social environment, not executed from a fixed internal plan. Direct argument against static retrieval architectures.
Ranganathan, S.R. (1931) — The Five Laws of Library Science — The original purpose-built context engine: save the time of the reader; every reader their book. A century ahead of current AI retrieval practice.
Dan Everett This is a really strong take, especially the idea that retrieval expands while context eliminates. That one line explains a lot of why AI struggles today. The focus on materiality is spot on too, because the real problem is not finding information, it is knowing what actually matters.
Relevance is undoubtedly a key concept in both human and artificial intelligence. But! It is in direct contradiction to any form of filtering. There is no information overload—and consequently no abundance of options for action—that needs to be limited through filtering; that is precisely what the concept of relevance precludes. For a 2-year-old child, a carbon monoxide alarm going off is not information; for every adult, it is. This also means that the Information layer has a preceding layer, namely: Data. And with that, we’re right back at the famous DIKW pyramid: https://www.garudax.id/pulse/hermeneutic-reinterpretation-dikw-pyramid-jack-jansonius-vlgbe/ To elaborate further: it is precisely the goals being pursued that determine what is important in a given context. If you go to the kitchen to make lasagna, missing potatoes or coffee filters are of no importance whatsoever (= not relevant); missing lasagna sheets, on the other hand, are. https://www.garudax.id/pulse/when-data-doesnt-know-what-means-jack-jansonius-humfe/
The distinction between retrieval and context is critical and most teams miss it entirely. But I'd push the definition further — relevance itself isn't a static property you can encode into graph edges at design time. What's materially relevant shifts with the task, the user's intent, and the state of the system at the moment of decision. Causal graphs give you structure, but someone still has to decide which causal paths are active for a given query. That's where most implementations break down — they build a beautiful ontology and then traverse all of it equally. The real engineering challenge is dynamic relevance scoring at inference time, not better graph construction at design time. Context isn't just about eliminating noise, it's about knowing what matters right now versus what mattered five minutes ago.
love this Dan Everett - I think a lot of the confusion is, in data, the “context” term has been attempted to be defined in an up front modelling frame (kgs, semantics. layers, data models etc) but as you said it’s starts with the dynamic assembly of information to support an action/decision. But one usually needs to use a combination of one or more of AI/stochastic/causal/heuristic approaches etc but the dynamic aspect is key
Dan Everett I love your article. Context is not automatically relevant and the more context you get the more diluted is the result. As with statistics or benchmarks it matches to the mass nothing more or less. If it is relevant has to be decided by the user or name it by people. If agents do so and they are fed by mass context they decide like LLMs do as a system1. This is something we need to be Award of if when we use AI, specially agents. The mir context you give the more it will be numerical approximation