Advanced SQL: Performance, Analytics, and Real-World Patterns
This guide dives into advanced SQL concepts that matter in real systems—performance tuning, indexing strategy, execution behavior, analytical queries, and concurrency control.
SQL Execution Model (What Actually Happens)
SQL is declarative—you describe what you want, not how to get it. The database optimizer determines execution.
Logical Query Processing Order
SELECT ...
FROM ...
JOIN ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...
Actual logical order:
Why This Matters
You cannot use aliases in the WHERE clause, aggregates are not allowed in WHERE, and applying filters as early as possible helps reduce the amount of data processed downstream.
CLI Example (PostgreSQL)
psql -U postgres -d testdb
EXPLAIN ANALYZE
SELECT department, COUNT(*)
FROM employees
WHERE salary > 50000
GROUP BY department
HAVING COUNT(*) > 5;
Indexing Strategies for Performance
Indexes are critical for query speed but come with tradeoffs.
Types of Indexes
Clustered Index: Defines the physical order of data within a table, allows only one per table, and is ideal for primary keys.
Non-Clustered Index: A separate structure that stores pointers to the data, allows multiple per table, and is best suited for lookups and joins.
Composite Index
CREATE INDEX idx_user_email_status
ON users (email, status);
Works for queries that filter on email, but does not provide any benefit when filtering only on status.
Covering Index
CREATE INDEX idx_orders_cover
ON orders (customer_id)
INCLUDE (order_date, total_amount);
It avoids table lookups and allows the query to be fully satisfied using only the index.
Indexing Pitfalls
Too many indexes slow down writes. Functions break index usage:
WHERE LOWER(email) = 'test@example.com'
Implicit conversions:
WHERE user_id = '123' -- string vs int
Example: Index Usage
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101;
Look for whether the query uses an Index Scan, which is generally good, or a Sequential Scan, which can be inefficient for large tables.
Query Performance Tuning
Core Principles
1. Avoid SELECT *
SELECT id, name FROM users;
2. Filter Early
SELECT *
FROM orders
WHERE order_date > '2025-01-01';
3. Optimize Joins
Ensure that joins are performed on indexed columns and that the data types of the joined fields match.
SELECT o.id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
Common Performance Killers
CLI Benchmark Example
time psql -U postgres -d testdb -c "SELECT * FROM large_table WHERE status = 'active';"
Window Functions (Advanced Analytics)
Window functions operate across rows without collapsing results.
Common Functions
ROW_NUMBER
The ROW_NUMBER() function assigns a unique, sequential number to each row in a result set based on the specified ordering, optionally restarting the numbering within partitions.
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM employees;
RANK vs DENSE_RANK
The SQL functions RANK() and DENSE_RANK() both assign rankings based on ordered values, but RANK() leaves gaps after ties while DENSE_RANK() assigns consecutive ranks without gaps.
RANK() OVER (...) -- gaps
DENSE_RANK() OVER (...) -- no gaps
LAG / LEAD
The SQL functions LAG() and LEAD() allow you to access values from preceding or following rows within the same result set without using a self-join.
SELECT salary,
LAG(salary) OVER (ORDER BY id) as prev_salary
FROM employees;
Running Total
This query returns each employee’s salary along with the previous employee’s salary based on the ordering of the id column by using the LAG() window function.
SELECT order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;
Performance Tip
Window functions require sorting → can be expensive.
CREATE INDEX idx_orders_date ON orders(order_date);
Subqueries vs CTEs (Common Table Expressions)
Subquery
SELECT name
FROM users
WHERE id IN (SELECT user_id FROM orders);
CTE (WITH clause)
WITH recent_orders AS (
SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
)
SELECT * FROM recent_orders;
Key Differences
A common table expression (CTE) improves query readability, but it is not always faster, as the query optimizer may choose to inline it rather than treat it as a separate execution step.
NULL Handling (Critical for Correctness)
Rules
NULL ≠ anything (even NULL)
WHERE column = NULL -- WRONG
WHERE column IS NULL -- CORRECT
Three-Valued Logic
NULL in Joins
SELECT *
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.user_id IS NULL;
Use COALESCE
The SQL COALESCE() function returns the first non-NULL value from a list of expressions, allowing you to provide fallback values when encountering missing or NULL data.
SELECT COALESCE(discount, 0) FROM orders;
Advanced Join Patterns
Self Join
A self join is used to join a table to itself in order to compare rows within the same dataset, such as modeling hierarchical relationships (e.g., employees and managers) or finding related records within the same table.
SELECT e.name, m.name as manager
FROM employees e
JOIN employees m ON e.manager_id = m.id;
Cross Join (Cartesian Product)
A cross join is used to combine every row from one table with every row from another table, producing a Cartesian product that is useful for generating all possible combinations of data.
SELECT *
FROM a
CROSS JOIN b;
Warning: Can be dangerous if unintended.
JOIN vs WHERE Filtering
-- Affects join behavior
LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'active'
-- Filters final result
WHERE o.status = 'active'
Real Interview Scenarios
Second Highest Salary
This query retrieves the second highest unique salary from the employees table by selecting distinct salary values, ordering them in descending order, and then skipping the highest value with OFFSET 1 while returning the next one with LIMIT 1.
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
Latest Record Per Group
This query returns the most recent order for each user by assigning a row number to each order within each user_id partition based on descending created_at, and then filtering to keep only the first row (rn = 1) for each user.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
) t
WHERE rn = 1;
Top N per Category
This query returns the top three products in each category by assigning a row number to products within each category based on descending sales and then filtering to keep only rows where the rank is less than or equal to three.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) as rn
FROM products
) t
WHERE rn <= 3;
Running Totals
This query returns each date along with a running total of sales by calculating the cumulative sum of the amount column ordered by date using a window function.
SELECT date,
SUM(amount) OVER (ORDER BY date) as cumulative
FROM sales;
Detect Duplicates
This query identifies duplicate email addresses by grouping rows by email, counting the number of occurrences for each, and returning only those with a count greater than one.
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
Transactions, Locks, and Isolation
This transaction starts with BEGIN, performs two related updates that transfer 100 from one account to another, and then finalizes the changes with COMMIT, ensuring that both operations are applied together as a single atomic unit so either both succeed or neither does.
Transaction Basics
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
ACID Properties
Atomicity ensures transactions are all-or-nothing, consistency guarantees the database remains in a valid state, isolation controls how transactions interact with each other, and durability ensures committed changes are permanently stored.
Isolation Levels
Deadlocks and Concurrency
Deadlock Example
Transaction A locks row1 → wants row2
Transaction B locks row2 → wants row1
Result → deadlock
Prevention Strategies
Access tables in a consistent order, keep transactions short to minimize locking and contention, and ensure foreign keys are properly indexed to improve query performance and maintain efficient relational operations.
Monitoring Example (Postgres)
This query returns all current lock information in PostgreSQL by selecting every row from the system catalog view that tracks locks held or awaited by active transactions.
SELECT * FROM pg_locks;
Execution Plans (Your Most Important Tool)
Always validate performance assumptions.
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
Look for whether the query uses an Index Scan or Sequential Scan, identify the join type being used (Hash, Nested Loop, or Merge), and compare the estimated cost against the actual execution time to evaluate performance accuracy.
Practical Optimization Workflow
CLI Workflow Example
# Run query
psql -U postgres -d testdb -c "SELECT * FROM orders WHERE customer_id=1;"
# Analyze plan
psql -U postgres -d testdb -c "
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id=1;
"
# Add index
psql -U postgres -d testdb -c "
CREATE INDEX idx_orders_customer ON orders(customer_id);
"
Final Thoughts
Performance comes from efficient data access patterns rather than clever SQL tricks, indexes are useful but only when aligned with query patterns, execution plans serve as the source of truth for understanding behavior, window functions enable advanced analytical capabilities, NULL handling impacts correctness more than performance, and well-designed transactions and concurrency control are critical for overall system stability.