Using LLMs with Data Analysis Tools

Explore top LinkedIn content from expert professionals.

Summary

Large language models (LLMs) are powerful AI tools that can help analyze data by interpreting questions, generating code, and providing insights from structured and unstructured information. By combining LLMs with specialized data analysis tools, users can automate complex tasks, enrich context, and guide decision-making without needing deep technical expertise.

  • Integrate specialized tools: Pair LLMs with data analysis platforms or statistical models to handle numerical tasks while letting the LLM offer explanations or context.
  • Automate workflows: Use LLMs to create step-by-step plans, generate analysis scripts, and execute code for tasks ranging from portfolio analytics to anomaly detection.
  • Build memory systems: Store user queries and solutions so your AI agent can recall past interactions, making follow-up questions and repeated analyses smoother and more insightful.
Summarized by AI based on LinkedIn member posts
  • View profile for Ravi Evani

    GVP, Engineering Leader / CTO @ Publicis Sapient

    4,041 followers

    How to Build an AI Agent for Data Analysis: A Blueprint An "agent" is more than just a chatbot. It’s a system designed to understand a goal, create a plan, and use tools to actively accomplish that goal. You can build your own powerful agent for data analysis, transforming how users interact with their data. This blueprint outlines the core components required to turn simple questions into actionable insights. An agentic system is built on three foundational concepts: an LLM for reasoning, a set of tools for taking action, and a sophisticated memory for learning and context. 1. The LLM: Your Agent's Reasoning Core At the heart of any data analysis agent is its reasoning core: a Large Language Model (LLM) like OpenAI's GPT or Google's Gemini. To build this, create a central orchestrator service (e.g., a Chat Service). This service shouldn't just pass the user's question to the LLM. Instead, it should enrich the prompt with context from the agent's memory. The LLM's role is not merely to respond, but to create a step-by-step plan and generate the precise Python code needed to perform the analysis. 2. Tools: Give Your Agent Hands-on Capabilities An agent is only as good as the tools it can use. For a data analysis agent, the primary tool is the ability to execute code. After the LLM generates an analysis script, your orchestrator service must run it against the relevant dataset. This is the most critical agentic step: it moves the system from simply planning to actively doing. You can equip your agent with other tools, such as services for data loading, chart generation, or even calling external APIs, allowing it to handle a wide variety of analytical tasks. 3. Memory: Enable Context and Learning To elevate your agent from a one-shot tool to an intelligent partner, you need to implement memory. A robust approach is to use a graph database like Neo4j to manage two distinct types: ➜ Short-Term Memory: Implement a mechanism to track the current conversation history for each user session. This allows your agent to understand follow-up questions ("now show me that by region") and maintain context, just like a human analyst would. ➜ Long-Term Memory: This is where your agent can learn. Every time it successfully executes an analysis, store the user's query and the generated code as a "solution." By creating a vector embedding of the query, you can enable semantic search. When a new question comes in, the agent can first search its long-term memory for a similar problem it has already solved, allowing it to deliver accurate results faster and more efficiently over time. By integrating these three components, your application will function as a true AI agent. Your central orchestrator service will drive the powerful loop of Memory -> Reasoning -> Action, creating a system that doesn't just answer questions, but actively solves them.

  • View profile for Arockia Liborious
    Arockia Liborious Arockia Liborious is an Influencer
    39,294 followers

    There’s a lot of excitement around using LLMs for forecasting. Fair. But here’s the practical answer: LLMs are not a drop-in replacement for time series models. If the problem is highly numerical, high-frequency, or tightly dependent on temporal structure, classical models still do the heavy lifting better. ARIMA, ETS, LightGBM, Lag features, Rolling statistics.... These are still the workhorses. Where teams get disappointed is when they expect an LLM to do raw forecasting better just because it is powerful. That rarely works. LLMs are not great at strict numerical precision. And they do not naturally respect temporal dependencies the way forecasting models do. The better architecture is a hybrid workflow. Use traditional models for the math. Use LLMs for the context around the math. That’s where things start getting interesting. LLMs can help with 1. Feature engineering from text-heavy signals like news, commentary, or notes 2. Better data representation when time series is paired with structured metadata 3. Contextual reasoning around seasonality, holidays, payday effects, or business events 4. Anomaly interpretation after statistical methods detect something unusual That is the real shift. Not LLMs instead of forecasting. LLMs around forecasting. In text-rich or data-scarce environments, that extra layer can matter. Because numbers tell you what changed. Context tells you why.

  • View profile for Kevin Hartman

    Associate Teaching Professor at the University of Notre Dame, Former Chief Analytics Strategist at Google, Author "Digital Marketing Analytics: In Theory And In Practice"

    24,647 followers

    When your deep insight requires a statistical model, using an LLM can be a smart first steps. But a black-box solution is an expensive gamble. A model without diagnostics is a structure without a foundation. Your model will collapse without rigor. Your LLM is not a Data Scientist. It is a tool. Command it to build the framework, not the answer. This prompt sequence gives you control and will help you guide the LLM to produce complex models with the rigor you need and the speed you want. 1. Command Governance: Force the LLM to verify statistical assumptions *before* training. 2. Command Efficiency: Define intelligent, limited tuning to optimize resource allocation. 3. Command Auditability: Demand documentation and model notes be generated *with* the final code. Here's a link to four LLM prompt frameworks that will help you run this sequence: https://bit.ly/3X9LEAh Art+Science Analytics Institute | University of Notre Dame | University of Notre Dame - Mendoza College of Business | University of Illinois Urbana-Champaign | University of Chicago | D'Amore-McKim School of Business at Northeastern University | ELVTR | Grow with Google - Data Analytics #Analytics #DataStorytelling

  • View profile for Navveen Balani
    Navveen Balani Navveen Balani is an Influencer

    Executive Director, Green Software Foundation (Linux Foundation) | Google Cloud Fellow | LinkedIn Top Voice | Sustainable AI & Green Software | Author | Let’s build a responsible future

    12,305 followers

    Google ADK + Zerodha MCP + LLMs: Autonomous portfolio analysis in action. Modern financial analysis is rapidly moving toward automation and agentic workflows. Integrating large language models (LLMs) with real-time financial data unlocks not just powerful insights but also new ways of interacting with portfolio information. This experiment brings together secure browser-based authentication, live data retrieval from Zerodha’s MCP, and LLM-driven risk and performance analytics—all orchestrated autonomously. This is a starter kit to get you going, but it can be extended to support sophisticated, fully automated quantitative models—simply by crafting effective prompts. I've made the experiment available on my GitHub repo. Please feel free to explore or adapt it for your own agentic financial analysis workflows. Code and documentation: https://lnkd.in/gme977GG #agenticai #mcp

  • View profile for Philipp Schmid

    AI Developer Experience at Google DeepMind 🔵 prev: Tech Lead at Hugging Face, AWS ML Hero 🤗 Sharing my own views and AI News

    165,281 followers

    Structured data like tables and graphs isn't just for spreadsheets anymore! 🚀StructLM proposes a new way of using LLMs to process and interpret structured data sources, outperforming 14 out of 18 benchmark existing models. 📊🔍 Implementation 1️⃣ Created a large dataset focusing on structured data e.g., question-answering, summarization, fact verification on different formats tables, databases, knowledge graphs 2️⃣ Fine-tuned CodeLlama (7-34B) on the dataset with instruction tuning by pairing system prompts with instructions. 3️⃣ Benchmark the models against state-of-the-art task-specific models across a diverse set of tasks. Insights 📊 Dataset includes over 1.1M samples from 25 tasks, including Table QA and fact verification. 🏆 StructLM achieves new state-of-the-art on 7 out of 18 benchmarks. 📈 Performance scales weakly with model size; 34B is only slightly better than 7B, suggesting the challenge of structured data. 💻 Code pretraining is more important than math pertaining. 🌍 Great example for domain adoption, StructLM 7B achieves avg. 71.1% and GPT-3.5 only of 39.5%. 🤗 Models & Datasets available on Hugging Face. Paper: https://lnkd.in/enKNV5mm Github: https://lnkd.in/eg_KCjBR Models & Dataset: https://lnkd.in/eu7DF44v

  • View profile for Andrés Jaime

    Senior Macro Quant/Systematic Researcher

    7,589 followers

    Large language models: a primer for economists (https://lnkd.in/eJschCjr) & Systematic Interpretation of Central Bank Communication Large Language Models (LLMs) have revolutionized economic research by enabling advanced analysis of unstructured textual data such as policy statements, financial reports, and news articles. These models transform text into structured numerical representations, facilitating tasks like sentiment analysis, forecasting, and topic modeling. Their contextual understanding, enabled by transformer-based architectures, makes them particularly effective in analyzing economic narratives. For instance, LLMs can evaluate market sentiment or interpret the tone of central bank communications, offering valuable insights into monetary policy impacts. A study of US equity markets demonstrated this by analyzing over 60,000 news articles to identify key drivers such as fundamentals, monetary policy, and market sentiment, linking these themes to stock market movements. Before the explosion of LLMs, I conducted research with my colleagues at Morgan Stanley to systematically analyze central bank communication using earlier machine-learning techniques. Specifically, we trained a Convolutional Neural Network (CNN) to assess the degree of hawkishness or dovishness in FOMC communications. This effort led to the development of the MNLPFEDS Index, which proved to be a powerful tool for anticipating monetary policy actions up to a year in advance. The index provided valuable insights into potential inflection points in the monetary cycle and their effects on rates, the yield curve, and the USD. This work highlighted the predictive power of communication analysis, even before the advent of the sophisticated transformer models now driving advancements in LLMs. LLMs and earlier machine-learning approaches, like CNN-based analysis, each bring unique strengths to understanding monetary policy and market dynamics. While LLMs excel in processing vast and complex datasets with contextual depth, their capabilities can be further enhanced through fine-tuning for domain-specific tasks. This adaptability allows LLMs to specialize in areas like central bank communication, where nuances in tone and context are crucial. Combined with the foundational contributions of earlier models like the MNLPFEDS Index, fine-tuned LLMs provide economists with a comprehensive toolkit to analyze qualitative insights and integrate them into robust quantitative frameworks, enriching the understanding of policy effects and broader economic trends. #EconomicResearch #MonetaryPolicy #CentralBankCommunication #MachineLearning #ArtificialIntelligence #NaturalLanguageProcessing #LLMs #DeepLearning #EconomicForecasting #SentimentAnalysis #TextAnalysis #DataScience #MacroEconomics #QuantitativeResearch

  • View profile for Scott Zakrajsek

    Chief Data Officer @ Power Digital | We use data to grow your business.

    11,564 followers

    Have you seen those AI chatbots that can run on your data? Here's how they work. The simplified process: 1.) LLM trains on your data The LLM (eg. openAI, Anthropic, Cortex, etc.) needs to understand your data. More specifically, your data tables, column schema, and how they all work together (sometimes called "ontology"). For example: - an "orders" table with an order_id tied to a customer_id - a "customer" table with a billing_address - a "products" table joined back to the "orders" table 2.) The user "prompts" the chatbot (asks the LLM a question) For example, "What state had the highest sales yesterday?" That question (prompt) gets sent via API to the LLM. 3.) The LLM converts the prompt into SQL It uses the training data from your data set to structure the query. SELECT state, SUM(net_sales) AS revenue FROM orders GROUP by state ORDER revenue DESC LIMIT 1 4.) The database executes the query and returns the answer. California: $235,000 The answer is displayed to the user in the chat response. This output doesn't just need to be text-format. You can also: - feed the output to a BI tool or front-end for pretty charts - trigger alerts and messages - have the LLM summarize and interpret the output in human-friendly language Of course, this is a very simple example. The LLM can get more intelligent by layering additional training on data input and output. Are you/your teams chatting w/ your data? Any successes or time savings? Fun use cases? #ai #chatbot #data #analysis

  • View profile for Richard Meng

    CEO @ Roe | Catching fin-crimes with vectors

    26,880 followers

    I’ll start sharing emerging patterns in LLM-driven unstructured data engineering and analysis to benefit the data community. They all come from Roe AI journey as we discover and map customer problems to solutions this year. Pattern 1: LLM in ETL -> LLM in ELT ETL: LLM data engineering happens before loading to the data warehouse/lakehouse. ✅ More developer tools available: a wide variety of open source / closed data engineering ETL tools exist. ❌ Inflexible for ad-hoc analysis and prompt engineering: since the output schema is generally pre-defined, ad-hoc changes to transformation logic take time. ℹ️ Examples: custom-built Python script calling LLM API. ELT: LLM data engineering happens at data warehouse/lakehouse natively with SQL. ✅ On-demand unstructured data transformation, no need to overthink schema, and flexible prompt engineering. ❌ Data warehouse/lakehouses SQL is still in the early phase of supporting multimodal unstructured data. ℹ️ Examples: Native SQL LLM functions such as Snowflake Cortex and GCP BigQuery can vectorize LLM.complete(<prompt>, <column>) on free-form text column at scale. How do you use LLM in your data pipeline today? Comment below. #dataengineering #dataanalysis

Explore categories