Text-to-SQL will replace Data Analysts
Hi everyone 👋, welcome to a new edition of The Data Path!
For years, we’ve dreamed of a world where anyone, not just data analysts could ask a question in plain English and get back an SQL query or even a full dashboard.
That’s the promise behind Text-to-SQL tools. It doesn't matter the tool that you are picking, the message is always the same:
Forget SQL! Just talk to your data.
It sounds revolutionary. But in practice? It’s not that simple. Please grab a coffee ☕ or your favorite drink and have a read to this article!
The fundamentals behind text-to-SQL
Text-to-SQL is the process of turning human language into structured logic.
When a user types a question like “Show me revenue by country for the last quarter”, several things happen behind the scenes:
It’s a fusion of linguistics, metadata engineering, and database logic which explains why it’s so powerful.
The Promise
The fundamentals of the text-to-SQL are very promising. The marketing managers have no idea about Databases, SQL or coding in general but they want to know answers for:
What were our top-selling products in the US last quarter?🤔
And the AI instantly generates a working SQL query, executes it, and visualizes the results.
No analysts. No dashboards. No waiting.
BOOM! They whole Data Department is no longer needed! Managers are happy, less paychecks to achieve better results. More profit! 💰
In theory, Text-to-SQL could eliminate the bottleneck between business questions and data insights. But in reality, what these systems generate is often syntactically correct yet semantically wrong.
The Real Problem Isn’t SQL! It’s Context
SQL requires contextual understanding of the underlying data model.
That’s exactly where every Text-to-SQL system breaks.
Even the most advanced LLMs can’t know what “active customer” or “net revenue” means inside your company unless you explicitly tell them. You can feed the model every table name, column, and data type, but metadata isn’t meaning.
To generate a correct query, an AI needs to understand:
That’s not something you can simply “prompt in.” This why nowadays The term of Data Governance is everywhere.
Recommended by LinkedIn
The Context Problem
Recent advances like the Model Context Protocol (MCP) aim to close this gap.
They allow LLMs to connect to your metadata catalog and automatically retrieving table schemas, lineage, and column descriptions. It’s a huge step forward.
But even with MCP, the model doesn’t “understand” your data the way an analyst does. It just has more structured context. The meaning the why still has to come from humans.
Guardrails Are the Real Game-Changer
A more pragmatic approach comes from using a guard-railed architecture:
This hybrid model keeps flexibility while protecting data integrity. It doesn’t replace the analyst, it extends their capabilities.
The AI handles repetitive querying, while the analyst interprets, validates, and communicates insights.
Will text-to-SQL It Replace Data Analysts?
Not yet... and maybe not ever!
LLMs can generate queries. They can’t generate understanding.
A Text-to-SQL model doesn’t know when data looks wrong, or when a trend doesn’t make sense. It doesn’t ask follow-up questions or challenge assumptions.
The real value of a data analyst isn’t typing SQL, it’s connecting business questions to reliable, contextual answers.
What will happen is a shift:
Text-to-SQL won’t replace data analysts, but analysts who use Text-to-SQL might replace those who don’t.
Long live SQL and Data Analysts! 👑
If you enjoyed this read, please give it a like so more people can discover it!
Don't forget to subscribe to The Data Path so you don't miss the latest trends in Data!
Best regards,
José Siles Data Engineer at Nestlé
Great article.
Not yet... and maybe not ever! 😀 👏
AI at the moment is more of a Autistic Intelligence, works fine in a narrow space. Did a PoC, training a model on strict rules, with complex statistical analysis that were coded as multi variable functions, which were then exposed to the model. Worked fine! Generic models that are fed on data, without directions, rules or any other means of control are just going to guess, models trained on rules and methodology don't guess! Ergo: One size does not fit all and in the current "shop", most sizes aren't available.
LOL because self-serve BI really replaced analysts years ago ...