Advanced SQL: Performance, Analytics, and Real-World Patterns

Advanced SQL: Performance, Analytics, and Real-World Patterns

This guide dives into advanced SQL concepts that matter in real systems—performance tuning, indexing strategy, execution behavior, analytical queries, and concurrency control.

SQL Execution Model (What Actually Happens)

SQL is declarative—you describe what you want, not how to get it. The database optimizer determines execution.

Logical Query Processing Order

SELECT ...
FROM ...
JOIN ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...        

Actual logical order:

  1. FROM
  2. JOIN
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. SELECT
  7. ORDER BY

Why This Matters

You cannot use aliases in the WHERE clause, aggregates are not allowed in WHERE, and applying filters as early as possible helps reduce the amount of data processed downstream.

CLI Example (PostgreSQL)

psql -U postgres -d testdb        
EXPLAIN ANALYZE
SELECT department, COUNT(*)
FROM employees
WHERE salary > 50000
GROUP BY department
HAVING COUNT(*) > 5;        

Indexing Strategies for Performance

Indexes are critical for query speed but come with tradeoffs.

Types of Indexes

Clustered Index: Defines the physical order of data within a table, allows only one per table, and is ideal for primary keys.

Non-Clustered Index: A separate structure that stores pointers to the data, allows multiple per table, and is best suited for lookups and joins.

Composite Index

CREATE INDEX idx_user_email_status
ON users (email, status);        

Works for queries that filter on email, but does not provide any benefit when filtering only on status.

Covering Index

CREATE INDEX idx_orders_cover
ON orders (customer_id)
INCLUDE (order_date, total_amount);        

It avoids table lookups and allows the query to be fully satisfied using only the index.

Indexing Pitfalls

Too many indexes slow down writes.  Functions break index usage:

WHERE LOWER(email) = 'test@example.com'        

Implicit conversions:

WHERE user_id = '123'  -- string vs int        

Example: Index Usage

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101;        

Look for whether the query uses an Index Scan, which is generally good, or a Sequential Scan, which can be inefficient for large tables.

Query Performance Tuning

Core Principles

1. Avoid SELECT *

SELECT id, name FROM users;        

2. Filter Early

SELECT *
FROM orders
WHERE order_date > '2025-01-01';        

3. Optimize Joins

Ensure that joins are performed on indexed columns and that the data types of the joined fields match.

SELECT o.id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;        

Common Performance Killers

  • Correlated subqueries
  • Large nested loops
  • Unnecessary DISTINCT
  • Joining before filtering

CLI Benchmark Example

time psql -U postgres -d testdb -c "SELECT * FROM large_table WHERE status = 'active';"        

Window Functions (Advanced Analytics)

Window functions operate across rows without collapsing results.

Common Functions

ROW_NUMBER

The ROW_NUMBER() function assigns a unique, sequential number to each row in a result set based on the specified ordering, optionally restarting the numbering within partitions.

SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM employees;        

RANK vs DENSE_RANK

The SQL functions RANK() and DENSE_RANK() both assign rankings based on ordered values, but RANK() leaves gaps after ties while DENSE_RANK() assigns consecutive ranks without gaps.

RANK() OVER (...)         -- gaps
DENSE_RANK() OVER (...)   -- no gaps        

LAG / LEAD

The SQL functions LAG() and LEAD() allow you to access values from preceding or following rows within the same result set without using a self-join.

SELECT salary,
LAG(salary) OVER (ORDER BY id) as prev_salary
FROM employees;        

Running Total

This query returns each employee’s salary along with the previous employee’s salary based on the ordering of the id column by using the LAG() window function.

SELECT order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;        

Performance Tip

Window functions require sorting → can be expensive.

CREATE INDEX idx_orders_date ON orders(order_date);        

Subqueries vs CTEs (Common Table Expressions)

Subquery

SELECT name
FROM users
WHERE id IN (SELECT user_id FROM orders);        

CTE (WITH clause)

WITH recent_orders AS (
  SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
)
SELECT * FROM recent_orders;        

Key Differences

A common table expression (CTE) improves query readability, but it is not always faster, as the query optimizer may choose to inline it rather than treat it as a separate execution step.

NULL Handling (Critical for Correctness)

Rules

NULL ≠ anything (even NULL)

WHERE column = NULL        -- WRONG
WHERE column IS NULL       -- CORRECT        

Three-Valued Logic

  • TRUE
  • FALSE
  • UNKNOWN

NULL in Joins

SELECT *
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.user_id IS NULL;        

Use COALESCE

The SQL COALESCE() function returns the first non-NULL value from a list of expressions, allowing you to provide fallback values when encountering missing or NULL data.

SELECT COALESCE(discount, 0) FROM orders;        

Advanced Join Patterns

Self Join

A self join is used to join a table to itself in order to compare rows within the same dataset, such as modeling hierarchical relationships (e.g., employees and managers) or finding related records within the same table.

SELECT e.name, m.name as manager
FROM employees e
JOIN employees m ON e.manager_id = m.id;        

Cross Join (Cartesian Product)

A cross join is used to combine every row from one table with every row from another table, producing a Cartesian product that is useful for generating all possible combinations of data.

SELECT *
FROM a
CROSS JOIN b;        

Warning: Can be dangerous if unintended.

JOIN vs WHERE Filtering

-- Affects join behavior
LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'active'

-- Filters final result
WHERE o.status = 'active'        

Real Interview Scenarios

Second Highest Salary

This query retrieves the second highest unique salary from the employees table by selecting distinct salary values, ordering them in descending order, and then skipping the highest value with OFFSET 1 while returning the next one with LIMIT 1.

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;        

Latest Record Per Group

This query returns the most recent order for each user by assigning a row number to each order within each user_id partition based on descending created_at, and then filtering to keep only the first row (rn = 1) for each user.

SELECT *
FROM (
  SELECT *,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
  FROM orders
) t
WHERE rn = 1;        

Top N per Category

This query returns the top three products in each category by assigning a row number to products within each category based on descending sales and then filtering to keep only rows where the rank is less than or equal to three.

SELECT *
FROM (
  SELECT *,
  ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) as rn
  FROM products
) t
WHERE rn <= 3;        

Running Totals

This query returns each date along with a running total of sales by calculating the cumulative sum of the amount column ordered by date using a window function.

SELECT date,
SUM(amount) OVER (ORDER BY date) as cumulative
FROM sales;        

Detect Duplicates

This query identifies duplicate email addresses by grouping rows by email, counting the number of occurrences for each, and returning only those with a count greater than one.

SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;        

Transactions, Locks, and Isolation

This transaction starts with BEGIN, performs two related updates that transfer 100 from one account to another, and then finalizes the changes with COMMIT, ensuring that both operations are applied together as a single atomic unit so either both succeed or neither does.

Transaction Basics

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;        

ACID Properties

Atomicity ensures transactions are all-or-nothing, consistency guarantees the database remains in a valid state, isolation controls how transactions interact with each other, and durability ensures committed changes are permanently stored.

Isolation Levels

Article content

Deadlocks and Concurrency

Deadlock Example

Transaction A locks row1 → wants row2

Transaction B locks row2 → wants row1

Result → deadlock

Prevention Strategies

Access tables in a consistent order, keep transactions short to minimize locking and contention, and ensure foreign keys are properly indexed to improve query performance and maintain efficient relational operations.

Monitoring Example (Postgres)

This query returns all current lock information in PostgreSQL by selecting every row from the system catalog view that tracks locks held or awaited by active transactions.

SELECT * FROM pg_locks;        

Execution Plans (Your Most Important Tool)

Always validate performance assumptions.

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';        

Look for whether the query uses an Index Scan or Sequential Scan, identify the join type being used (Hash, Nested Loop, or Merge), and compare the estimated cost against the actual execution time to evaluate performance accuracy.

Practical Optimization Workflow

  1. Start with correct query
  2. Add filters early
  3. Check execution plan
  4. Add indexes
  5. Remove unnecessary operations
  6. Benchmark

CLI Workflow Example

# Run query
psql -U postgres -d testdb -c "SELECT * FROM orders WHERE customer_id=1;"

# Analyze plan
psql -U postgres -d testdb -c "
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id=1;
"

# Add index
psql -U postgres -d testdb -c "
CREATE INDEX idx_orders_customer ON orders(customer_id);
"        

Final Thoughts

Performance comes from efficient data access patterns rather than clever SQL tricks, indexes are useful but only when aligned with query patterns, execution plans serve as the source of truth for understanding behavior, window functions enable advanced analytical capabilities, NULL handling impacts correctness more than performance, and well-designed transactions and concurrency control are critical for overall system stability.

 

To view or add a comment, sign in

More articles by Christopher Adamson

  • SQL from Zero to Confident

    SQL becomes much easier when you stop thinking of it as a language to memorize and start seeing it as a data pipeline:…

  • Accessible Loading States in HTMX with aria-busy

    One important note up front: current htmx documentation does not list an official extension literally named aria-busy…

  • The htmx response-targets Extension

    The response-targets extension solves a very practical problem in htmx: routing different HTTP responses to different…

  • Getting Started with the htmx preload Extension

    The preload extension in htmx is designed to make navigation and fragment loading feel faster by fetching content…

  • Debouncing Requests in htmx

    Modern interactive interfaces often rely on continuous user input, whether it’s searching, filtering, validating, or…

  • Caching Strategies in htmx

    When people talk about “the htmx caching extension,” they usually mean one of three different things: preload, which…

  • HTMX formdata extension: make formdata-aware components work with htmx

    Many modern UI components, including Web Components, custom inputs, and rating widgets, integrate with form submission…

    1 Comment
  • Htmx CSRF Extension (a.k.a. csrf-token)

    In plain htmx setups, CSRF protection is typically handled by adding a header globally with hx-headers or by relying on…

    1 Comment
  • Htmx ajax-header Extension

    The ajax-header extension is small but addresses a common integration issue: some servers, middleware, or legacy…

  • Auto-Submitting Forms with the htmx Auto-Submit Extension

    Web applications increasingly favor responsive, event-driven interfaces where user interactions immediately trigger…

Explore content categories