Does agentic systems benefit from normalization?
I’d been building many versions of agentic systems, in various environments (Agent Builder, Copilot Studio, Microsoft Foundry, GitHub Copilot) with various front-end “sockets”, i.e. Microsoft Teams, Microsoft 365 Copilot, Visual Studio Code. MCP and the CLI too.
All this in the pursuit of giving the best experience to the users of these agentic systems.
The reason for all these experimentations, frameworks, tools and platforms is really because of the diversity of questions / prompts / use-cases that the “system” is expected to cater for.
Software development, or shall I call it systems creation, these days can no longer cuddle in the comfort of a defined front-end, responding to the finite possibilities of a “form”.
One instance of a question that I was trying to solve for came from Richard Koh . In our work with partners, the prompt was:
“Top 5 partners in terms of technical capabilities for these 8 specializations [redacted] for each Area in the Asia region”
The other parts of this prompt justify the agentic part of this approach (which I might do another post on!) but the “for each Area” part is what I want to focus on in this post.
Area from a database design point of view is best as a 3NF normalized table. But while I was wrapping a MCP-ed sub-agent around this, I wasn’t sure if this is best for agentic systems, so of course, asked Copilot.
The answer turns out is a tour of the architectural difference between traditional OLTP database and agentic systems. Here’s the conversation verbatim:
Q: Does agentic systems benefit from normalization or is it better not to normalize?
Copilot:
Ah — this is where classical database theory and agentic/LLM system design start to diverge.
Short answer:
✅ Transactional agent systems benefit from normalization ✅ Reasoning‑heavy agent systems often work better with intentional denormalization 🚫 Fully normalized schemas are usually NOT optimal for agent orchestration layers
🧠 Traditional DB Goal vs Agentic System Goal
Normalization helps databases.
Agents operate on:
Which behave very differently from relational algebra.
📌 Why Highly Normalized Schemas Hurt Agents
Let's say you modelled this in perfect 3NF:
Partner
--------
PartnerID
PartnerName
AreaID
Area
--------
AreaID
AreaName
Now a user asks your agent:
"Which ASEAN partners are Kubernetes specialized?"
The agent must now:
For a SQL engine? ✅ trivial
For an LLM‑grounded reasoning agent?
You’ve just created:
Which leads to:
📉 The Agent Cost of Normalization
In agentic pipelines:
Agents reason better over:
✅ self-contained entity descriptions
Instead of:
SQL
{
"PartnerID": 101,
"AreaID": 3
}
Prefer:
JSON
{
"PartnerName": "ABC",
"Area": "ASEAN",
"Specializations": ["AKS", "AI Platform"]
}
📌 Agentic Design Pattern Emerging in 2025–2026
Modern agentic architectures now use:
✅ Hybrid Modeling
Recommended by LinkedIn
🏗️ Recommended Pattern for Agent‑Ready Data
1️⃣ Keep Source-of-Truth Normalized
For:
Example, in Fabric DataWarehouse / Azure SQL DB, keep the Partner table and Area table separate.
2️⃣ Create an Agent Serving Layer
Materialize into:
PartnerAgentView
-----------------
PartnerName
AreaName
Or even, persist as:
3️⃣ Feed RAG / Toolformer Layer
Agents perform better when:
🔥 Actual Rule of Thumb Now Used
Normalize for correctness
Denormalize for cognition
Vectorize for reasoning
✳️ The Mental Shift
Relational modeling optimizes consistency under mutation
Agentic modeling optimizes understanding under uncertainty
Which are fundamentally different system objectives.
Bottom Line and Example
In short, when creating relational data layers for AI, we can think about designing the tables to make it easier for AI to understand.
Table (schema) design makes a difference.
For example, if I went with the full 3NF design shown above, with Partner and Area tables separate, GPT-4.1 was not able intuit this properly…
Naturally, a more costly model tried harder and was able to produce results:
Claude Sonnet 4.6 is 3x the cost of GPT-4.1 on GitHub Copilot though, which perfectly exemplifies the case for optimizing for AI’s understanding, i.e. to produce more accurate results with lower cost. If we have area as text within the Partners table, no doubt GPT-4.1 would had been able to find it in the first pass.
Summary (and what happens in practice)
This has been an interesting diversion, exploration and thinking session.
So, what happened to what I was trying to solve for, i.e. the modelling around the partner ecosystem?
Well, my eventual schema is 3NF 😊
Why?
Because, well, I already cut the data into that shape and imported it into my database already.
I reckon my time and the amount of time my agent is going to be used with Claude Sonnet 4.6 or better models in tow is still better optimized that way…
And all this comes around to my favorite principle of respecting the diversity of situations that are there in the world.
There’s always the ideal architecture or perfect whatever, but for what context?