SQL + JSON: Expanding the Same System
When Context and Structure come together

SQL + JSON: Expanding the Same System

What changes when you add structure—and what stays the same

A simple starting point

We started with a straightforward question:

Why do certain wine regions consistently produce higher-rated wines?

From the wine reviews dataset, it’s easy to get part of the answer. You group by region, average the ratings, and sort the results.

That gives you a clear view of what is happening. But it doesn’t explain much beyond that.

There’s no real sense of:

  • what defines those regions
  • how they differ
  • or why the results look the way they do

So the next step was to add context to the same dataset.


Bringing in AVA data

That’s where the AVA (American Viticultural Area) data comes in.

These are officially defined wine regions designed to capture the differences that matter in wine—climate, soil, elevation, and geography.

Instead of treating “Napa Valley” as just a label, you start to see it as a region with structure behind it.

The dataset comes as GeoJSON, so we stored it as-is as JSON in the same database.

At that point, we’re effectively working with:

  • structured data in tables
  • semi-structured data in documents

in one system.


What changed

The system didn’t change. The database didn’t change.

The query model didn’t fundamentally change.

What changed was the set of questions we could ask.


The model

We brought together:

  • wine reviews (relational data)
  • AVA metadata (JSON)
  • a mapping layer connecting the two

Everything runs through the same query path.


What we looked at

We walked through a small set of questions to see how this behaves in practice.


Coverage

The first thing that showed up was coverage.

Not all wine reviews map cleanly to AVA regions.

That becomes the baseline. If coverage is low, any joined analysis becomes harder to interpret—not because the query is wrong, but because the data is incomplete.

Article content
Coverage sets the trust boundary.

Adding context

Once you have a reasonable mapping, the output starts to look different.

Instead of just ranking regions by score, you can see:

  • where they are
  • how long they’ve existed
  • whether there the underlying data volume is meaningful.

The output starts to move beyond a simple ranking and toward something you can use for decisions.

Article content
Context turns ranking into something actionable

JSON in the query

From there, JSON becomes part of the query itself.

Attributes that don’t naturally fit into relational tables can now be used directly in analysis.

The point isn’t the attribute—it’s that semi-structured data is now part of the same analytical path.


Structure and behavior

Looking at AVA age adds another dimension.

You start to see whether older regions behave differently from newer ones across rating, price, and value.

This shifts the conversation from what performs best to how different regions behave.


The edge cases

Then there are the outliers.

High-rated wines that don’t map to AVAs at all.

They highlight where:

  • the data is incomplete
  • conclusions may change

In practice, they become the next set of problems to solve.

Article content
Unmapped high-value records represent high-impact data gaps

What this suggests

After running through a few of these, a pattern starts to emerge.

SQL gives you a clear view of what happened.

JSON adds the structure around it.

And how much you can trust the combined result depends on how well those two are connected.

The representation expanded. The system stayed the same.


Architectural takeaway

This isn’t a new system so much as an extension of the one you already have.

Structured tables and JSON documents are part of the same query path, each contributing a different layer of meaning.

As representations expand, the key is maintaining a single, governed path from data to decision.


Why this works

This only works when the system doesn’t force a choice between OLTP, OLAP, and JSON.

When those capabilities live together, the boundary shifts from infrastructure to query design.

It’s not SQL versus NoSQL.

It’s structured and semi-structured data working together in the same system.


Closing thought

As systems evolve, you keep adding layers—structure, meaning, context.

The system itself doesn’t get simpler.

What matters is how well those layers work together.


Question

In your environment:

Are structured and semi-structured data part of the same query path, or are they still handled separately?

This is part of an ongoing series on data systems and AI. Full version and future posts are available on substack.

To view or add a comment, sign in

More articles by Ron Batra

Explore content categories