SQL + JSON: Expanding the Same System

Ron Batra

Published Apr 30, 2026

+ Follow

What changes when you add structure—and what stays the same

A simple starting point

We started with a straightforward question:

Why do certain wine regions consistently produce higher-rated wines?

From the wine reviews dataset, it’s easy to get part of the answer. You group by region, average the ratings, and sort the results.

That gives you a clear view of what is happening. But it doesn’t explain much beyond that.

There’s no real sense of:

what defines those regions
how they differ
or why the results look the way they do

So the next step was to add context to the same dataset.

Bringing in AVA data

That’s where the AVA (American Viticultural Area) data comes in.

These are officially defined wine regions designed to capture the differences that matter in wine—climate, soil, elevation, and geography.

Instead of treating “Napa Valley” as just a label, you start to see it as a region with structure behind it.

The dataset comes as GeoJSON, so we stored it as-is as JSON in the same database.

At that point, we’re effectively working with:

structured data in tables
semi-structured data in documents

in one system.

What changed

The system didn’t change. The database didn’t change.

The query model didn’t fundamentally change.

What changed was the set of questions we could ask.

The model

We brought together:

wine reviews (relational data)
AVA metadata (JSON)
a mapping layer connecting the two

Everything runs through the same query path.

What we looked at

We walked through a small set of questions to see how this behaves in practice.

Coverage

The first thing that showed up was coverage.

Not all wine reviews map cleanly to AVA regions.

That becomes the baseline. If coverage is low, any joined analysis becomes harder to interpret—not because the query is wrong, but because the data is incomplete.

Article content — Coverage sets the trust boundary.

Adding context

Once you have a reasonable mapping, the output starts to look different.

Instead of just ranking regions by score, you can see:

where they are
how long they’ve existed
whether there the underlying data volume is meaningful.

The output starts to move beyond a simple ranking and toward something you can use for decisions.

JSON in the query

From there, JSON becomes part of the query itself.

Attributes that don’t naturally fit into relational tables can now be used directly in analysis.

The point isn’t the attribute—it’s that semi-structured data is now part of the same analytical path.

Structure and behavior

Looking at AVA age adds another dimension.

You start to see whether older regions behave differently from newer ones across rating, price, and value.

This shifts the conversation from what performs best to how different regions behave.

The edge cases

Then there are the outliers.

High-rated wines that don’t map to AVAs at all.

They highlight where:

the data is incomplete
conclusions may change

In practice, they become the next set of problems to solve.

What this suggests

After running through a few of these, a pattern starts to emerge.

SQL gives you a clear view of what happened.

JSON adds the structure around it.

And how much you can trust the combined result depends on how well those two are connected.

The representation expanded. The system stayed the same.

Architectural takeaway

This isn’t a new system so much as an extension of the one you already have.

Structured tables and JSON documents are part of the same query path, each contributing a different layer of meaning.

As representations expand, the key is maintaining a single, governed path from data to decision.

Why this works

This only works when the system doesn’t force a choice between OLTP, OLAP, and JSON.

When those capabilities live together, the boundary shifts from infrastructure to query design.

It’s not SQL versus NoSQL.

It’s structured and semi-structured data working together in the same system.

Closing thought

As systems evolve, you keep adding layers—structure, meaning, context.

The system itself doesn’t get simpler.

What matters is how well those layers work together.

Question

In your environment:

Are structured and semi-structured data part of the same query path, or are they still handled separately?

This is part of an ongoing series on data systems and AI. Full version and future posts are available on substack.

To view or add a comment, sign in

SQL + JSON: Expanding the Same System

Ron Batra

A simple starting point

Bringing in AVA data

What changed

The model

What we looked at

Coverage

Adding context

JSON in the query

Structure and behavior

The edge cases

What this suggests

Architectural takeaway

Why this works

Closing thought

Question

More articles by Ron Batra

Explore content categories

A simple starting point

Bringing in AVA data

What changed

The model

What we looked at

Coverage

Adding context

JSON in the query

Structure and behavior

The edge cases

What this suggests

Architectural takeaway

Why this works

Closing thought

Question

More articles by Ron Batra

One Question, Three Data Models: Why Structure and Human Validation Still Matter in AI Analytics

Uncorking Data: From Dataset to Conversational Analytics in Hours

AI is not just a tooling problem, it is also a "skills" problem.

Looking for Leaders (passionate about Climate Reality in Austin Area)

Having more fun with OpenAI (DALL-E)

Having some fun with OpenAI to name Tesla Model Y

Reflecting on Career, Jobs, Purpose

Reflecting on an early stage AI app

What was your first real-life AI project ?

How to make IoT work for you in 2020

Explore content categories