Advanced SQL: Performance, Analytics, and Real-World Patterns

This guide dives into advanced SQL concepts that matter in real systems—performance tuning, indexing strategy, execution behavior, analytical queries, and concurrency control.

SQL Execution Model (What Actually Happens)

SQL is declarative—you describe what you want, not how to get it. The database optimizer determines execution.

Logical Query Processing Order

SELECT ...
FROM ...
JOIN ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...

Actual logical order:

FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

Why This Matters

You cannot use aliases in the WHERE clause, aggregates are not allowed in WHERE, and applying filters as early as possible helps reduce the amount of data processed downstream.

CLI Example (PostgreSQL)

psql -U postgres -d testdb

EXPLAIN ANALYZE
SELECT department, COUNT(*)
FROM employees
WHERE salary > 50000
GROUP BY department
HAVING COUNT(*) > 5;

Indexing Strategies for Performance

Indexes are critical for query speed but come with tradeoffs.

Types of Indexes

Clustered Index: Defines the physical order of data within a table, allows only one per table, and is ideal for primary keys.

Non-Clustered Index: A separate structure that stores pointers to the data, allows multiple per table, and is best suited for lookups and joins.

Composite Index

CREATE INDEX idx_user_email_status
ON users (email, status);

Works for queries that filter on email, but does not provide any benefit when filtering only on status.

Covering Index

CREATE INDEX idx_orders_cover
ON orders (customer_id)
INCLUDE (order_date, total_amount);

It avoids table lookups and allows the query to be fully satisfied using only the index.

Indexing Pitfalls

Too many indexes slow down writes. Functions break index usage:

WHERE LOWER(email) = 'test@example.com'

Implicit conversions:

WHERE user_id = '123'  -- string vs int

Example: Index Usage

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101;

Look for whether the query uses an Index Scan, which is generally good, or a Sequential Scan, which can be inefficient for large tables.

Query Performance Tuning

Core Principles

1. Avoid SELECT *

SELECT id, name FROM users;

2. Filter Early

SELECT *
FROM orders
WHERE order_date > '2025-01-01';

3. Optimize Joins

Ensure that joins are performed on indexed columns and that the data types of the joined fields match.

SELECT o.id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

Common Performance Killers

Correlated subqueries
Large nested loops
Unnecessary DISTINCT
Joining before filtering

CLI Benchmark Example

time psql -U postgres -d testdb -c "SELECT * FROM large_table WHERE status = 'active';"

Window Functions (Advanced Analytics)

Window functions operate across rows without collapsing results.

Common Functions

ROW_NUMBER

The ROW_NUMBER() function assigns a unique, sequential number to each row in a result set based on the specified ordering, optionally restarting the numbering within partitions.

SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM employees;

RANK vs DENSE_RANK

The SQL functions RANK() and DENSE_RANK() both assign rankings based on ordered values, but RANK() leaves gaps after ties while DENSE_RANK() assigns consecutive ranks without gaps.

RANK() OVER (...)         -- gaps
DENSE_RANK() OVER (...)   -- no gaps

LAG / LEAD

The SQL functions LAG() and LEAD() allow you to access values from preceding or following rows within the same result set without using a self-join.

SELECT salary,
LAG(salary) OVER (ORDER BY id) as prev_salary
FROM employees;

Running Total

This query returns each employee’s salary along with the previous employee’s salary based on the ordering of the id column by using the LAG() window function.

SELECT order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

Performance Tip

Window functions require sorting → can be expensive.

CREATE INDEX idx_orders_date ON orders(order_date);

Subqueries vs CTEs (Common Table Expressions)

Subquery

SELECT name
FROM users
WHERE id IN (SELECT user_id FROM orders);

CTE (WITH clause)

WITH recent_orders AS (
  SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
)
SELECT * FROM recent_orders;

Key Differences

A common table expression (CTE) improves query readability, but it is not always faster, as the query optimizer may choose to inline it rather than treat it as a separate execution step.

NULL Handling (Critical for Correctness)

Rules

NULL ≠ anything (even NULL)

WHERE column = NULL        -- WRONG
WHERE column IS NULL       -- CORRECT

Three-Valued Logic

TRUE
FALSE
UNKNOWN

NULL in Joins

SELECT *
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.user_id IS NULL;

Use COALESCE

The SQL COALESCE() function returns the first non-NULL value from a list of expressions, allowing you to provide fallback values when encountering missing or NULL data.

SELECT COALESCE(discount, 0) FROM orders;

Advanced Join Patterns

Self Join

A self join is used to join a table to itself in order to compare rows within the same dataset, such as modeling hierarchical relationships (e.g., employees and managers) or finding related records within the same table.

SELECT e.name, m.name as manager
FROM employees e
JOIN employees m ON e.manager_id = m.id;

Cross Join (Cartesian Product)

A cross join is used to combine every row from one table with every row from another table, producing a Cartesian product that is useful for generating all possible combinations of data.

SELECT *
FROM a
CROSS JOIN b;

Warning: Can be dangerous if unintended.

JOIN vs WHERE Filtering

-- Affects join behavior
LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'active'

-- Filters final result
WHERE o.status = 'active'

Real Interview Scenarios

Second Highest Salary

This query retrieves the second highest unique salary from the employees table by selecting distinct salary values, ordering them in descending order, and then skipping the highest value with OFFSET 1 while returning the next one with LIMIT 1.

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

Latest Record Per Group

This query returns the most recent order for each user by assigning a row number to each order within each user_id partition based on descending created_at, and then filtering to keep only the first row (rn = 1) for each user.

SELECT *
FROM (
  SELECT *,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
  FROM orders
) t
WHERE rn = 1;

Top N per Category

This query returns the top three products in each category by assigning a row number to products within each category based on descending sales and then filtering to keep only rows where the rank is less than or equal to three.

SELECT *
FROM (
  SELECT *,
  ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) as rn
  FROM products
) t
WHERE rn <= 3;

Running Totals

This query returns each date along with a running total of sales by calculating the cumulative sum of the amount column ordered by date using a window function.

SELECT date,
SUM(amount) OVER (ORDER BY date) as cumulative
FROM sales;

Detect Duplicates

This query identifies duplicate email addresses by grouping rows by email, counting the number of occurrences for each, and returning only those with a count greater than one.

SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

Transactions, Locks, and Isolation

This transaction starts with BEGIN, performs two related updates that transfer 100 from one account to another, and then finalizes the changes with COMMIT, ensuring that both operations are applied together as a single atomic unit so either both succeed or neither does.

Transaction Basics

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

ACID Properties

Atomicity ensures transactions are all-or-nothing, consistency guarantees the database remains in a valid state, isolation controls how transactions interact with each other, and durability ensures committed changes are permanently stored.

Isolation Levels

Deadlocks and Concurrency

Deadlock Example

Transaction A locks row1 → wants row2

Transaction B locks row2 → wants row1

Result → deadlock

Prevention Strategies

Access tables in a consistent order, keep transactions short to minimize locking and contention, and ensure foreign keys are properly indexed to improve query performance and maintain efficient relational operations.

Monitoring Example (Postgres)

This query returns all current lock information in PostgreSQL by selecting every row from the system catalog view that tracks locks held or awaited by active transactions.

SELECT * FROM pg_locks;

Execution Plans (Your Most Important Tool)

Always validate performance assumptions.

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

Look for whether the query uses an Index Scan or Sequential Scan, identify the join type being used (Hash, Nested Loop, or Merge), and compare the estimated cost against the actual execution time to evaluate performance accuracy.

Practical Optimization Workflow

Start with correct query
Add filters early
Check execution plan
Add indexes
Remove unnecessary operations
Benchmark

CLI Workflow Example

# Run query
psql -U postgres -d testdb -c "SELECT * FROM orders WHERE customer_id=1;"

# Analyze plan
psql -U postgres -d testdb -c "
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id=1;
"

# Add index
psql -U postgres -d testdb -c "
CREATE INDEX idx_orders_customer ON orders(customer_id);
"

Final Thoughts

Performance comes from efficient data access patterns rather than clever SQL tricks, indexes are useful but only when aligned with query patterns, execution plans serve as the source of truth for understanding behavior, window functions enable advanced analytical capabilities, NULL handling impacts correctness more than performance, and well-designed transactions and concurrency control are critical for overall system stability.

Advanced SQL: Performance, Analytics, and Real-World Patterns

Christopher Adamson

SQL Execution Model (What Actually Happens)

Indexing Strategies for Performance

Query Performance Tuning

Window Functions (Advanced Analytics)

Subqueries vs CTEs (Common Table Expressions)

NULL Handling (Critical for Correctness)

Advanced Join Patterns

Real Interview Scenarios

Transactions, Locks, and Isolation

Deadlocks and Concurrency

Execution Plans (Your Most Important Tool)

Practical Optimization Workflow

Final Thoughts

More articles by Christopher Adamson

Explore content categories

SQL Execution Model (What Actually Happens)

Indexing Strategies for Performance

Query Performance Tuning

Window Functions (Advanced Analytics)

Subqueries vs CTEs (Common Table Expressions)

NULL Handling (Critical for Correctness)

Advanced Join Patterns

Real Interview Scenarios

Transactions, Locks, and Isolation

Deadlocks and Concurrency

Execution Plans (Your Most Important Tool)

Practical Optimization Workflow

Final Thoughts

More articles by Christopher Adamson

SQL from Zero to Confident

Accessible Loading States in HTMX with aria-busy

The htmx response-targets Extension

Getting Started with the htmx preload Extension

Debouncing Requests in htmx

Caching Strategies in htmx

HTMX formdata extension: make formdata-aware components work with htmx

Htmx CSRF Extension (a.k.a. csrf-token)

Htmx ajax-header Extension

Auto-Submitting Forms with the htmx Auto-Submit Extension

Explore content categories