From SQL Queries to Conversational Analytics: My Journey with LLMs, RAG, and Data Agents
Body:
As data professionals, we spend countless hours writing SQL queries, debugging joins, and preparing dashboards. But what if we could change the way we interact with data entirely?
Recently, I built an LLM-powered agent that connects directly to our SQL databases (SQL Server & ClickHouse) and uses Retrieval-Augmented Generation (RAG) on top of schema documentation. The goal was simple: allow anyone in the company to ask questions in plain language — and get answers, visualizations, and insights without writing SQL.
The Challenges We Face
My Approach
I built an LLM Agent connected to our SQL Server and ClickHouse databases, with RAG on top of markdown documentation of our schema.
Key features:
The Solution
I combined LLMs, RAG, and SQL execution into a single workflow:
How It Works (Code Snippets)
1. Schema + RAG + Create Vector DB I start by loading schema docs and building a vector index:
def load_schema(file_path):
with open(file_path, "r", encoding="utf-8") as f:
return f.read()
def build_vector_db(schema_text):
sections = schema_text.split("## Table:")
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
try:
collection = client.delete_collection("schema")
except Exception as e:
print(e)
collection = client.get_or_create_collection("schema")
for i, sec in enumerate(sections):
sec = sec.strip()
if sec:
emb = model.encode(sec).tolist()
collection.add(documents=[sec], embeddings=[emb], ids=[f"sec_{i}"])
return collection, model
2. SQL Generation + Auto-Fix When a query fails, the system retries with error context:
Recommended by LinkedIn
# The agent attempts to fix the SQL query on failure
for attempt in range(1, max_retries + 1):
try:
rows, cols = run_sql(sql, db_config)
return sql, rows, cols
except Exception as e:
last_error = str(e)
log_sql_error(sql, last_error)
corrected = fix_sql_with_error(
...
error_text=last_error,
...
)
if not corrected:
continue
sql = corrected
3. User Feedback Users can guide future answers with feedback:
if q.startswith("\\Feedback"):
fb_text = q.replace("\\Feedback", "").strip()
log_feedback(fb_text)
print("Feedback saved:", fb_text)
4. LLM agent Built with a local LLM via Ollama and a self-correcting logic, this agent can translate natural language questions into valid SQL Server queries.:
# URL for the local Ollama instance
OLLAMA_URL = "http://localhost:11434/api/generate"
def ask_llm(prompt, model="gpt-oss:20b"):
"""
Sends a prompt to the specified LLM via Ollama and returns the response.
Includes previous user feedback in the prompt to improve context.
"""
# Add previous feedback to all prompts
prompt += get_feedback_context()
payload = {"model": model, "prompt": prompt, "stream": False}
resp = requests.post(OLLAMA_URL, json=payload)
resp.raise_for_status()
return resp.json()["response"].strip()
5. Visualization Every result is automatically plotted:
def visualize_data(rows, cols, chart_type):
if not rows or not cols:
print('visualize_data, rows and columns:\n',rows, cols)
print("No data to visualize")
return
df = pd.DataFrame.from_records(rows, columns=cols)
print("df visualizetion is:\n", df)
fig = None
if chart_type == "line":
fig = px.line(df, x=cols[0], y=cols[1:], markers=True)
elif chart_type == "bar":
if len(cols) >= 2:
fig = px.bar(df, x=cols[0], y=cols[1])
elif chart_type == "scatter":
if len(cols) >= 2:
fig = px.scatter(df, x=cols[0], y=cols[1])
elif chart_type == "pie":
if len(cols) >= 2:
fig = px.pie(df, names=cols[0], values=cols[1])
else:
if any(("date" in c.lower()) or ("time" in c.lower()) for c in cols):
fig = px.line(df, x=cols[0], y=cols[1:], markers=True)
elif len(cols) == 2:
fig = px.bar(df, x=cols[0], y=cols[1])
elif len(cols) >= 3:
fig = px.scatter(df, x=cols[0], y=cols[1], size=cols[2], color=cols[1])
if fig:
fig.show()
else:
print('visualize_data, rows and columns:\n',cols[0], cols[1:])
print("Could not generate visualization")
Why This Matters
The Bigger Picture
This project is not just about speeding up reports. It’s about reshaping how companies interact with their data — making insights accessible, secure, and aligned across teams.
It also opens the door to:
Closing:
This journey taught me that AI is not here to replace analysts — it’s here to empower everyone to ask better questions and focus on higher-value work.
Have you experimented with LLMs for data analysis in your company? What opportunities or challenges do you see in combining AI with BI?
#Python #LLM #Ollama #SQL #DataScience #AI #GenerativeAI #AutoFix #ChromaDB
Great job dear Yashar! It is so inspiring. I will be grateful if you provide a brief information about the computational resources required for such LLM-backed data analysis projects.
So incredible Yashar Jan 👏🏻👏🏻
That's Amazing. Writing SQL sentences is very hard for me.