Data Agent

Data Agent

If I can only learn 1 vocabulary this year, I would choose “Data Agent”. It is that important.

In the near future all of us in data warehousing and BI will need to use "chat with your data" to query databases and documents. Most of us will be using someone else's software, most probably the from our data platform. Or BI platform.

Some of us will be lucky enough that we need to build our own "chat with your data". The core technology is "text to SQL". Basically it's a SQL builder. The input is user prompt, a few databases, and a semantic model. It is so difficult to get it right, so you need to put the brightest minds to this task. 

But there are more to it than just a SQL builder. First of all you need an AI agent called a "planner" that understands the goal, and then creates a plan to achieve that goal.

Secondly you need a SQL verifier that runs the SQL generated by the SQL builder, and check if the output is as expected by the plan. If not then it would notify the SQL builder to generate another SQL, acting like a feedback loop.

Thirdly, you need a pair of agents for the visualisation: a "chart generator" and a "chart intepreter". The former generates the charts based on the data, and the latter interprets the chart and provides a narrative containing the story in the chart.

So recapping, the agents you need to build are:

  1. Planner
  2. SQL Builder
  3. SQL Verifier 
  4. Chart Generator 
  5. Chart Intepreter

And the flow diagram between those agents is something like this:

Article content

But don’t quote me on the above, as I am no data chat expert. Josh Reini is.

If you are interested in “building data agents” for data chat like above, yesterday Andrew Ng posted this course: https://www.deeplearning.ai/short-courses/building-and-evaluating-data-agents/ In this short course (only 2 hours, and it’s free at the moment) Josh Reini and Anupam Datta explain how to build data agents:

Article content


It’s much more than what I describe above. You’ll also learn about:

  1. How to design a data agent that connects to data sources (databases, files) and performs web searches to respond to users’ queries.
  2. How to add observability to the workflow and evaluate the quality of its output.
  3. How to assess whether the final answer is relevant to the user’s query and grounded in the collected data (using LLM as a judge).
  4. How to evaluate the agent performance at runtime.

For those who are following me on LinkedIn, apologies if I bore you by keep saying data chat, data chat, data chat like a broken record. But “chat with your data” will change the whole data warehousing and BI industry and as someone who has been working and writing in that industry since 1996 I feel obliged to inform the DWBI industry.

Here is my article on Snowflake Intelligence: https://www.garudax.id/pulse/snowflake-intelligence-vincent-rainardi-0iqke/ You can see in this article how those agents I mentioned above are thinking and working. It is truly amazing to see it on Snowflake . It only takes 15 minutes to setup @Nick Akincilar’s demo that I mentioned in that article. Do that demo, it is the most amazing thing I’ve ever seen in data warehousing and BI. Truly. You won’t regret it.

My Linkedin articles: https://www.garudax.id/pulse/list-all-my-articles-vincent-rainardi-eohge/ My blog: https://dwbi1.wordpress.com/

#DataChat #DataAgent #AI #AgenticAI #DataWarehouse #BusinessIntelligence Snowflake DeepLearning.AI

And this this possibly the most influential post I read from you this year!!! Thank you for sharing, Vincent, this was totally golden!

Interesting read. It seems in many usecases in future, the value proposition won't solely be performance, cost effectiveness, ability to scale, but more & more how easy is it to use the data and find answers with the advent of AI capabilities. But this also puts more focus on availability of metadata which is generally ignored.

Like
Reply

Thanks for sharing Vincent Rainardi What is the difference between "Data Agents" and "MCP Servers" that provide similar capabilities? How is it different from what was previously "AI Assistants"? I have come across the term "Data Agent" on Snowflake and MS Fabric but I get the impression there is a inadvertent/deliberate narrowing of the scope in their offerings... Setting aside the definitional question and focusing on capability, to badge a component data agent, I would expect a data agent to provide the following capabilities: - automating common data dasks e.g. data preparation tasks such as cleaning messy data, removing duplicates, and handling missing values. - data exploration & analysis i.e. autonomously exploring datasets to identify trends, patterns, anomalies, and correlations. - multi-source connectivity to retrieve data from a variety of data sources including databases, spreadsheets, file, APIs, etc. - insight generation which extends beyond providing raw data to generating summarisation, creating visualisations, and commentary. We can even extend the above to Agentic Data Management, which is a opens up a whole set of possibilities... What's your take?

💡 A data agent can become a powerful new interface for data engineers to interact with the data platform. But it will quickly hit its limits — every organization has unique data, systems, and stakeholders. What we need is an adaptive framework that allows us to work with AI effectively. I believe the future of data lies in finding the right balance between: • Business domain knowledge • Data expertise • AI capabilities To capture this intersection, I call the emerging role: AI Insights Orchestrator. Excited to share more on this idea in the future!

All AI agents rely on data to perform their tasks, but data agents are a specific type of agent that specializes in the data domain, primarily focusing on data retrieval, querying, and management tasks, such as extracting information from vast datasets using tools like SQL scripts.

To view or add a comment, sign in

More articles by Vincent Rainardi

  • Unstructured Data - From Conversational Files to Conversational Analytics

    For decades analytics is about tables, numbers and relational databases. It is about structured data, as we call it.

    3 Comments
  • Business Analyst

    Before I was a data architect, I was a data engineer. And before I was a data engineer, I was a business analyst.

    1 Comment
  • CDO and CIO: What's the difference?

    So CIO is Chief Data Officer. And CDO is Chief Data Officer.

  • Snowflake dbt Projects

    How does Snowflake dbt projects look like? It looks like this: Snowflake dbt Projects and Cortex Code On the left you…

  • Stupid Questions

    There is NO such thing as a stupid question. Why? Because asking questions is a good way to get knowledge.

  • The Science of (Data) Migration

    Say you have a data warehouse in SQL Server or Oracle, and you need to migrate it to Snowflake or Databricks. The…

    1 Comment
  • Cortex Search

    Cortex is the AI capability in Snowflake. Of all the Cortex features, Cortex Search is probably the least well known.

  • AI-ready data: what does it mean?

    JI am a practical person and when I hear people talking “fluffy cloud” words like “AI-ready data” I always try find out…

  • Interval Data Type

    We all know a data type called Date. And Time.

  • Row Timestamp

    In Snowflake, the Row Timestamp is a column that stores when each row was last updated. It’s a brand new feature, went…

Others also viewed

Explore content categories