Impact Analysis for Dashboards with SQL Code in Initial SQL — What Tableau's Lineage Can't See

Impact Analysis for Dashboards with SQL Code in Initial SQL — What Tableau's Lineage Can't See

What Is Impact Analysis in Tableau?

If you've ever been asked: "We're renaming a column in the database — which dashboards will break?" — you've done impact analysis.

Impact analysis is the process of tracing upstream dependencies — identifying every workbook, data source, and dashboard that touches a specific database table, column, or stored procedure. It's essential for:

  • Database migrations — renaming tables, moving schemas, deprecating columns
  • Data model changes — knowing which reports break before you push changes
  • Compliance and auditing — documenting what data flows where
  • Decommissioning — safely retiring tables nobody uses anymore

How Impact Analysis Is Typically Done

1. Tableau Server / Tableau Cloud UI & Built-in Options

The most common approach: click through each workbook, open the data source tab, and manually check what tables or SQL it uses. For a project with 5 workbooks, this works. For 40+ workbooks across multiple projects — it doesn't scale.

Lineage View (available in Tableau Cloud and Server with Data Management Add-on) provides a visual dependency graph. You can navigate from a database → tables → datasources → workbooks. It's useful, but:

  • Requires the Data Management Add-on
  • Only shows dependencies that Tableau's internal cataloging has indexed
  • Does not show Initial SQL — these are completely invisible

Beyond the UI, there are several programmatic approaches depending on your environment:

Tableau Server — PostgreSQL Repository

Tableau Server stores all of its metadata in an internal PostgreSQL database (the Tableau Repository). If you have admin access, you can connect directly to this repository and write queries to perform impact analysis. Key tables include _workbooks, _datasources, _views, and _data_connections. For example, you can query _data_connections to find every workbook connected to a specific database or server. This is a powerful approach for on-premise Tableau Server environments, but it's not available in Tableau Cloud — Salesforce does not expose the underlying PostgreSQL repository to cloud customers.

Tableau Cloud — GraphiQL via External Assets

In Tableau Cloud, the equivalent approach is through External Assets in the web UI, where you can browse databases, tables, and their downstream lineage. For programmatic access, Tableau Cloud exposes the Metadata API — a GraphiQL endpoint at /api/metadata/graphql. You can write GraphiQL queries to traverse the lineage graph: start from a database or table and find every data source and workbook downstream. This is the closest equivalent to querying the PostgreSQL repository on Server, and it works well for relationship tables and Custom SQL — but as we'll see, it has a critical blind spot.

Article content


REST API

Both Tableau Server and Tableau Cloud support the REST API, The API gives you simple access to the functionality behind Tableau data sources, projects, workbooks, site users, sites, flows, and more. You can use this access to create your own custom applications or to script interactions with Tableau resources.

2. Tableau Metadata API (GraphiQL)

Tableau provides a GraphiQL-based Metadata API at /api/metadata/graphiql that programmatically exposes lineage. You can query for databases, tables, and their downstream datasources and workbooks:

{
  databases(filter: { name: "my_database" }) {
    name
    tables {
      name
      schema
      downstreamDatasources {
        name
        projectName
        downstreamWorkbooks {
          name
          projectName
          owner { name }
        }
      }
    }
  }
}
        

This is powerful and gets you 80% of the way there. But the Metadata API has a critical blind spot: it cannot see Initial SQL.

3. The Blind Spot: Initial SQL (Invisible to All Built-in Tools)

Tableau supports a feature called Initial SQL (also called one-time SQL). This is SQL code that runs once when the workbook opens — before any data query executes. It's commonly used for:

  • Creating temp tables (CREATE TABLE #staging ...)
  • Setting session variables (SET @fiscal_year = 2026)
  • Running INSERT INTO or MERGE statements to stage data

Article content


Here's the problem: Initial SQL references tables that Tableau never registers as dependencies. The Metadata API doesn't index them. The Lineage View doesn't show them. They're invisible.

Our Approach: A Fully Automated Python Pipeline        

We built an end-to-end pipeline using the REST API and XML parsing to find every dependency — including the ones Tableau hides:

  1. REST API — authenticate and bulk-download all workbooks (.twbx files) from a Tableau project
  2. TWB XML parsing — extract .twb XML from the archives, then parse specific XML tags to find every SQL statement and table reference
  3. SQL parsing — use regex to extract actual table names from the SQL code
  4. Excel export — push the complete inventory to a formatted Excel workbook

The result: a complete inventory with every SQL statement and every table name, mapped to the workbook and dashboard that uses it.

Architecture Overview:

┌─────────────────────────────────────────────────────────────┐
│                    Tableau Cloud / Server                    │
│                                                             │
│  REST API ──► Download .twbx files (workbook archives)      │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    Python Pipeline                           │
│                                                             │
│  1. Authenticate via PAT (Personal Access Token)            │
│  2. Download all workbooks in target project (.twbx)        │
│  3. Extract .twb XML from .twbx ZIP archives                │
│  4. Parse XML tags for 5 types of dependencies:             │
│                            │
│     • <relation one-time-sql="...">                         │
│     • <relation name="Custom SQL Query" type="text">        │
│     • <relation type="table" table="[dbo].[TableName]">     │
│     • <datasource name="sqlproxy..." caption="DS Name">     │
│  5. Extract table names from SQL using regex                │
│  6. Export to formatted Excel                               │
└─────────────────────────────────────────────────────────────┘        


Step-by-Step Walkthrough

Step 1: Authenticate and Download Workbooks via REST API

We authenticate using a Personal Access Token (PAT) and download every workbook in the target project as .twbx files.

def authenticate():
    url = f"{TABLEAU_SERVER}/api/{API_VERSION}/auth/signin"
    payload = {
        "credentials": {
            "personalAccessTokenName": PAT_NAME,
            "personalAccessTokenSecret": PAT_SECRET,
            "site": {"contentUrl": SITE_NAME}
        }
    }
    response = requests.post(url, json=payload, headers={"Content-Type": "application/json"})
    # Parse XML response for auth token and site ID
    root = ET.fromstring(response.text)
    auth_token = root.find(".//credentials").attrib["token"]
    site_id = root.find(".//site").attrib["id"]
    return auth_token, site_id
        

We then list all workbooks in the project with pagination and download each one:

def get_workbooks_in_project(auth_token, site_id, project_name):
    """Paginated fetch of all workbooks in a project."""
    all_workbooks = []
    page_number = 1
    while True:
        url = (f"{TABLEAU_SERVER}/api/{API_VERSION}/sites/{site_id}/workbooks"
               f"?filter=projectName:eq:{project_name}"
               f"&pageSize=100&pageNumber={page_number}")
        data = requests.get(url, headers={"X-Tableau-Auth": auth_token}).json()
        workbooks = data["workbooks"]["workbook"]
        all_workbooks.extend(workbooks)
        if page_number * 100 >= int(data["pagination"]["totalAvailable"]):
            break
        page_number += 1
    return all_workbooks

def download_workbook(auth_token, site_id, workbook_id, name, folder):
    """Download a workbook as .twbx file."""
    url = f"{TABLEAU_SERVER}/api/{API_VERSION}/sites/{site_id}/workbooks/{workbook_id}/content"
    response = requests.get(url, headers={"X-Tableau-Auth": auth_token}, stream=True)
    filepath = os.path.join(folder, f"{name}.twbx")
    with open(filepath, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
        

Gotcha we discovered: Some .twbx files downloaded from the REST API are actually plain XML — not ZIP archives. The file has a .twbx extension but starts with <?xml. Our pipeline handles this gracefully:

try:
    with zipfile.ZipFile(path, "r") as z:
        # Extract .twb from ZIP
        for name in z.namelist():
            if name.endswith(".twb"):
                twb_xml = z.open(name).read()
except zipfile.BadZipFile:
    # Fallback: file is plain XML disguised as .twbx
    with open(path, "rb") as f:
        header = f.read(10)
    if header.startswith(b"<?xml") or header.startswith(b"<work"):
        twb_xml = open(path, "rb").read()  # Read as raw XML
        

Step 2: Parse TWB XML — The Core Engine

A .twb file is XML. Every datasource connection, every SQL statement, every table reference is embedded in the XML structure. We parse it to extract five types of dependencies:

Type 1: One-Time SQL

Stored as a one-time-sql attribute on <relation> elements:

 <connection application-intent='false' authentication='sspi' class='sqlserver' dbname='db_name' minimum-driver-version='SQL Server Native Client 10.0' multi-subnet-failover='false' odbc-native-protocol='yes' one-time-sql='Declare @year int&#10;set @year=year(GETDATE())&#10;--select @year,@year-1&#10;&#10;select lID,&#10;&#9;&#9;SID,&#10;&#9;&#9;Level1_Name,&#10;&#9;&#9;Level3_Name,&#10;&#9;&#9;rhc.Level4,&#10;&#9;&#9;rhc.Level6_Name ,&#10;&#9;&#9;rhc.fileId &#10;&#10;&#9;&#9;into #Hierarchy&#10;from&#10;&#9;&#9;sales (NOLOCK)&#10;&#9;LEFT JOIN orders rhc(nolock) on&#10;&#9; 
        
for rel in root.iter():
    if strip_ns(rel.tag) == "relation":
        one_time = rel.attrib.get("one-time-sql", "")
        if one_time.strip():
            found.append(("One-time SQL", one_time))
        

Type 2: Custom SQL Queries

Text content inside <relation name="Custom SQL Query" type="text">:

<relation name="Custom SQL Query" type="text">
    SELECT e.StudentID, e.ProgramCode, d.AcademicYear
    FROM dbo.program e
    JOIN dbo.date_ay d ON e.DateKey = d.DateKey
</relation>
        
for rel in root.iter():
    if strip_ns(rel.tag) == "relation":
        name = rel.attrib.get("name", "")
        if name.lower() == "custom sql query":
            sql_text = (rel.text or "").strip()
            if sql_text:
                found.append(("Custom SQL", sql_text))
        

Type 3: Relationship Tables (Direct Table Bindings)

When a workbook uses a direct table connection (drag-and-drop in the data source pane), it appears as:

<relation type="table" table="[dbo].[orders]" connection="sqlserver.abc123">
        

We resolve the connection attribute back to the database name using a lookup map we build from <named-connection> elements:

# Build connection → database lookup
conn_map = {}
for nc in root.iter():
    if strip_ns(nc.tag) == "named-connection":
        nc_name = nc.attrib.get("name", "")
        for child in nc:
            if strip_ns(child.tag) == "connection":
                conn_map[nc_name] = {
                    "dbname": child.attrib.get("dbname", ""),
                    "server": child.attrib.get("server", ""),
                }

# Extract table references
for rel in root.iter():
    if strip_ns(rel.tag) == "relation" and rel.attrib.get("type") == "table":
        table = rel.attrib.get("table", "").replace("[", "").replace("]", "")
        conn_ref = rel.attrib.get("connection", "")
        db = conn_map.get(conn_ref, {}).get("dbname", "")
        full_name = f"{db}.{table}" if db else table
        found.append(("Relationship Table", full_name))
        

Step 3: Parse Table Names from Initial SQL

Initial SQL often contains complex T-SQL with CREATE TABLE, INSERT INTO, EXEC, FROM, and JOIN references. We use regex to extract every table name:

def extract_tables_from_sql(sql_text):
    """Parse SQL and extract all referenced table names."""
    table_pattern = r'[\[\"]?[A-Za-z0-9_#@]+[\]\"]?(?:\.[\[\"]?[A-Za-z0-9_#@]+[\]\"]?){0,2}'
    
    patterns = [
        rf'\bFROM\s+({table_pattern})',
        rf'\bJOIN\s+({table_pattern})',
        rf'\bINTO\s+({table_pattern})',
        rf'\bUPDATE\s+({table_pattern})',
        rf'\bCREATE\s+TABLE\s+({table_pattern})',
        rf'\bEXEC(?:UTE)?\s+({table_pattern})',
        rf'\bMERGE\s+(?:INTO\s+)?({table_pattern})',
    ]
    
    found = set()
    for pattern in patterns:
        for match in re.findall(pattern, sql_text, re.IGNORECASE):
            clean = match.replace("[", "").replace("]", "")
            if clean.upper() not in SQL_KEYWORDS:
                found.add(clean)
    return sorted(found)

def classify_table(name):
    if name.startswith("#"):
        return "Temp Table"
    elif name.startswith("@"):
        return "Table Variable"
    else:
        return "Permanent Table"
        

This catches dependencies like:

  • dbo.sales (permanent table — invisible to Metadata API)
  • #temp_staging (temp table — created and consumed within the session)
  • @report_params (table variable)


Step 4: Export to Formatted Excel

Everything is exported to an Excel workbook with two sheets:

Sheet 1 — SQL Inventory: Every SQL statement and table reference, mapped to its workbook and dashboard.

Sheet 2 — Table References: Every actual database table involved, with the type of reference and the source datasource.

Getting Started

The full pipeline requires:

  • Python 3.x with requests, pandas, openpyxl, and standard library modules
  • A Tableau Personal Access Token (PAT) with site-level access
  • The Tableau REST API (v3.15+) and Metadata API endpoints

The pipeline is project-agnostic — change the TARGET_PROJECT_NAME variable and re-run to scan any project on your Tableau site.


To view or add a comment, sign in

More articles by Venkatesh Eshwarappa

  • Anomaly Detection in Tableau

    Introduction Most dashboards present data. Very few indicate when something requires immediate attention.

Others also viewed

Explore content categories