MCPs and location data: Why reference data is the hardest context for AI

GeoPostcodes

The most complete reference location data for companies with a global footprint.

Published Mar 31, 2026

TL;DR

MCP (Model Context Protocol) is rapidly becoming the standard bridge between AI models and external data, but the protocol is only as good as the data flowing through it.
Location reference data is one of the hardest data types for LLMs to handle correctly.
LLMs hallucinate location data confidently. They can't distinguish authoritative sources from stale ones, which creates real downstream errors in logistics, address validation, and reporting.
As AI workflows mature, the quality of your reference data layer becomes more critical, not less. Domain expertise cannot be replaced by statistical pattern-matching.

The Model Context Protocol (MCP) is an open-source standard that enables Large Language Models (LLMs) to access files, databases, and APIs.

I think of MCPs as a bridge between language models and the external world. MCP clients such as Claude, Cursor, and ChatGPT now integrate with MCP servers, enabling AI to interact with everything from calendars to APIs.

But early implementations have exposed challenges that I've been watching closely. As Kent C. Dodds (world-renowned coding expert and speaker) observes, MCP has entered its natural critique phase, similar to early web browsers that were slow and inefficient but evolved into today's dominant platform. The protocol works; people use it extensively every day. We're now in the phase of making it right, then making it fast.

In my experience working with location data, I've seen a specific tension emerge: while LLMs and MCP handle data reasonably well, they struggle to grasp the difference between reference data and other data. And within reference data, location information is the hardest test case.

Understanding MCP

The Model Context Protocol defines two key components: MCP clients and MCP servers. Clients are embedded in AI tools like Claude, Cursor, or custom applications. Servers are hosted by organizations providing data or tools for LLM consumption.

When an LLM needs external capabilities, it uses its client to connect to an MCP server, discover available tools and resources, and determine how to leverage them to complete its task. The protocol standardizes this discovery and invocation process, making it theoretically simple to connect AI to any service.

The elegance of this design explains MCP's rapid adoption. But the protocol itself is only infrastructure. What matters — and what I care most about — is what flows through it.

The challenge of using MCPs with location data

Reference data is structured, authoritative, and foundational. Location reference data — postal codes, administrative boundaries, time zones, and coordinates — is a clear example. Unlike conversational text, reference data must be correct.

Geographic information also involves complex hierarchies: cities contain neighborhoods, regions contain cities, and countries contain regions.

What makes this particularly difficult, in my view, is that location data is constantly evolving. Postal codes are introduced and retired. Municipalities merge or split. Administrative divisions are redrawn. Maintaining both accuracy and backward compatibility requires careful structuring.

These decisions require domain expertise. Without that expert-level context, LLMs are likely to provide incorrect answers confidently, what the field calls hallucinations.

The hallucination problem with location data in LLMs

LLMs generate responses through statistical analysis of their training data. They identify patterns and generate text that follows them. For most conversational tasks, this works well. For reference location data, it creates a fundamental problem.

The location data in LLM training sets is often outdated, inconsistent, or incomplete. Models have no mechanism to recognize this; they don't know which information is current and which is stale. They can't distinguish authoritative sources from casual mentions. They simply generate responses that match statistical patterns in what they've seen.

The challenge of choosing a reliable data source

Even outside of MCP and AI systems, this is a problem I see organizations struggle with constantly. Maintaining reliable location data is difficult. Postal codes, administrative divisions, boundaries, and coordinates form one of the most critical reference data domains because they underpin logistics, taxation, reporting, and regulatory processes. Maintaining this data requires significant effort and expertise.

Postal systems evolve constantly: municipalities merge or split, administrative boundaries change, and postal codes are introduced or retired. Ensuring accuracy requires continuous validation against authoritative sources.

Many organizations underestimate this effort. When reference data becomes fragmented, outdated, or inconsistent across systems, downstream platforms inherit those inconsistencies. Because maintaining this layer requires continuous reconciliation of multiple authoritative sources, many organizations rely on specialized data providers that curate and maintain global location reference datasets.

Conclusion: Use-case-driven MCP design with quality location data

From where I sit, your MCP's maturity depends on parallel evolution in two areas: server design and data quality.

Well-designed MCP servers are tailored to specific use cases, not generic wrappers around existing non-MCP optimized APIs. They minimize token consumption, return actionable responses, and reduce errors. They understand that serving an LLM is fundamentally different from serving a programmer.

MCP amplifies the consequences of poor data. When LLMs interact with APIs, stale or inconsistent data produces wrong answers and propagates errors at scale through hallucinations and misinterpretation.

This is precisely why curated datasets with active governance are becoming increasingly critical in AI workflows. An expert provider needs to cross-reference sources, track changes over time, and maintain data freshness. AI can't do this work reliably because it requires knowing which sources are authoritative, knowledge that comes from domain expertise, not statistical patterns.

That distinction is one I've spent fifteen years thinking about. It doesn't get simpler as the tooling around it becomes more powerful.

Looking for self-hosted ZIP code or address data? At GeoPostcodes, we provide high-quality data to integrate as a global reference layer.

👉 Discover GeoPostcodes' self-hosted data offering.

Until next time,

Jérôme Urbain

Head of Products at GeoPostcodes

The Geodata Insider

610 followers

+ Subscribe

Mike Waldon 1mo

Really interesting piece. Couldn’t agree more that reference data is becoming the critical limiter in any AI‑led location or address solution. What we’re seeing both across the industry and from our own partners is that the real challenge isn’t just accuracy, it’s the speed of change, the layers of context needed for automation, and the operational knock‑on effects when any part of that foundation slips. Good to see more focus on treating reference data as core infrastructure rather than an afterthought. It’s a conversation the industry definitely needs more of.

2 Reactions

Juan Eudes Martínez Madera

Bachelor in Geography with special interest in GIS and Remote Sensing based on open source systems

1mo

Sofia Di Croce

1 Reaction

See more comments

To view or add a comment, sign in

MCPs and location data: Why reference data is the hardest context for AI

GeoPostcodes

The most complete reference location data for companies with a global footprint.

TL;DR

Understanding MCP

The challenge of using MCPs with location data

The hallucination problem with location data in LLMs

Recommended by LinkedIn

The challenge of choosing a reliable data source

Conclusion: Use-case-driven MCP design with quality location data

The Geodata Insider

610 followers

More articles by GeoPostcodes

Others also viewed

MCP, Structured Context Interfaces, and Why AI Governance Finally Becomes Real

AI-Generated Structured Responses (AGSR): A New Way to Reduce AI Hallucinations

From JSON to TOON: The Serialization format that cuts LLM Token consumption by 60% - Insights from ClockHash

🧠 Vector Databases: The Unsung Hero Powering GenAI

Beyond RAG: Supercharging LLMs with Knowledge Graphs and Hybrid Search

Genie out of the Bottle - Best Practices in Unlocking Data Insights with Databricks Genie

"Beyond Traditional RAG: What Data-Driven Leaders Must Know About Graph-Powered AI"

From RDF Triples to Quads and Beyond: RDF, OWL, and the Challenge of Smart Knowledge*

Data Fabric: Your Portal to the Obscure Universe

Agent AI with Single Memory Store

Model Context Protocol (MCP) for Development Environments

Using Multi-Dimensional Context in Large Language Models

How Large Language Models Reshape Data Patterns

How to Use Context-Aware Protocols in AI Systems

Explore content categories

TL;DR

Understanding MCP

The challenge of using MCPs with location data

The hallucination problem with location data in LLMs

Recommended by LinkedIn

The challenge of choosing a reliable data source

Conclusion: Use-case-driven MCP design with quality location data

The Geodata Insider

610 followers

More articles by GeoPostcodes

Self-hosted vs API delivery: Understanding the two deployment models

The blind spot in logistics pricing workflows

Landmass IDs: The missing link for cross‑border logistics

The real risks behind international expansion, according to logistics leaders

USPS API rate limit: what shipping teams need to rethink about address validation

The operational foundation most shipping companies overlook

What most teams get wrong about international address validation

The hidden factor breaking your ZIP code maps

Spot underserved markets before the competition

Wait, ZIP codes change HOW often?

Others also viewed

MCP, Structured Context Interfaces, and Why AI Governance Finally Becomes Real

AI-Generated Structured Responses (AGSR): A New Way to Reduce AI Hallucinations

From JSON to TOON: The Serialization format that cuts LLM Token consumption by 60% - Insights from ClockHash

🧠 Vector Databases: The Unsung Hero Powering GenAI

Beyond RAG: Supercharging LLMs with Knowledge Graphs and Hybrid Search

Genie out of the Bottle - Best Practices in Unlocking Data Insights with Databricks Genie

"Beyond Traditional RAG: What Data-Driven Leaders Must Know About Graph-Powered AI"

From RDF Triples to Quads and Beyond: RDF, OWL, and the Challenge of Smart Knowledge*

Data Fabric: Your Portal to the Obscure Universe

Agent AI with Single Memory Store

Similar topics

Model Context Protocol (MCP) for Development Environments

Using Multi-Dimensional Context in Large Language Models

How Large Language Models Reshape Data Patterns

How to Use Context-Aware Protocols in AI Systems

Explore content categories