Optimize PostgreSQL Query Execution with Lateral Join

1mo

Our PostgreSQL query executed 1,000 subqueries. It only needed 75. Imagine this: • 1,000 orders • Each referencing one of 75 products The original query fetched the product name per order using a correlated subquery. SELECT o.id, (SELECT name FROM product WHERE id = o.product_id) FROM orders o; PostgreSQL executed the product lookup 1,000 times. The fix? Rewrite it using a LATERAL join. SELECT o.id, p.name FROM orders o LEFT JOIN LATERAL ( SELECT name FROM product WHERE id = o.product_id ) p ON true; Now PostgreSQL introduces a Memoize node. It caches results by product_id. From the query plan • 1000 rows processed • 75 cache misses (real product lookups) • 925 cache hits (reused results) That means PostgreSQL only queried the product table 75 times instead of 1000. Small query rewrite. Big difference in execution strategy. This works best when many rows reference a small set of keys. If every row has a unique key, Memoize has nothing to reuse. Sometimes performance improvements aren't about adding indexes — they're about changing the shape of the query plan. Note: Hash and Merge joins were disabled for demonstration. Normally PostgreSQL chooses the join strategy based on cost. #PostgreSQL #SQLPerformance #BackendEngineering

To view or add a comment, sign in

More Relevant Posts

Jatin Rathod
1mo
Report this post
🚨 What Actually Happens When You Run VACUUM in PostgreSQL? Most developers think VACUUM is just “cleanup”. That’s not even half the story. PostgreSQL doesn’t overwrite data when you UPDATE or DELETE. Because of MVCC (Multi-Version Concurrency Control), it creates new row versions and leaves the old ones behind. These old rows are called dead tuples. 🧠 So what does VACUUM really do? When you run VACUUM, PostgreSQL: ✔️ Scans the table page by page ✔️ Identifies dead tuples (no longer visible to any transaction) ✔️ Marks that space as reusable (not returned to OS) ✔️ Updates visibility map for faster future scans Important: 👉🏻 VACUUM does NOT shrink your table size ⚠️ Why this matters in production If you don’t vacuum regularly: • Dead tuples keep accumulating • Table bloat increases • Indexes become inefficient • Queries slow down • Storage usage grows Eventually… performance degrades badly. 🔥 VACUUM vs VACUUM FULL 👉🏻 VACUUM • Non-blocking • Reuses space internally • Safe for production 👉🏻VACUUM FULL • Rewrites the entire table • Returns space to OS • Requires exclusive lock (blocks reads/writes) • Use VACUUM FULL only when absolutely necessary. ⚙️ What about AUTOVACUUM? PostgreSQL already runs an autovacuum in the background. But here’s the catch: 👉🏻 Default settings are often NOT enough for high-write workloads As a DBA, you should monitor: pg_stat_user_tables Dead tuple count Autovacuum frequency 💡 Real takeaway. VACUUM is not optional maintenance. It’s part of how PostgreSQL stays alive under load. Ignore it, and your database won’t crash… It will just get slower and slower until users start complaining. #PostgreSQL #Database #DBA #PerformanceTuning #Backend #DataEngineering #SQL #DevOps
Like Comment
To view or add a comment, sign in
Peter Radilov
3w Edited
Report this post
I wrote a hands-on guide to PostgreSQL transactions and locks. If you've ever had queries that suddenly hang, or two concurrent UPDATEs that produce unexpected results — I tried to explain what's actually happening inside PostgreSQL. What's covered: → Why Serializable fails even without a logical conflict → A race condition puzzle where Alice and Bob both leave on-call duty — and nobody's left → How VACUUM reclaims space (and when it doesn't) → All demonstrated with SQL you can copy-paste and run https://lnkd.in/dijcP-8t #PostgreSQL #Database #SQL #Backend #SoftwareEngineering

PostgreSQL: Transactions, Locks, and Why Serializable Fails dev.to

5 Comments
Like Comment
To view or add a comment, sign in
marmelab

1,760 followers
3w
Report this post
How well do you really know PostgreSQL? We put together a list of 9 advanced (yet incredibly practical) PostgreSQL features that are too often overlooked. Mastering these shifts the heavy lifting from the code to the database, leading to: ✅ Simplified backend code ✅ Improved performance ✅ Reduced technical debt If PostgreSQL is part of your stack, chances are at least one of these features will change how you think about where your business logic truly belongs. 👉 Read the full article here: https://lnkd.in/exm4WTz9 #PostgreSQL #Database #Backend #SoftwareEngineering #SQL #WebDevelopment #OpenSource

9 Advanced PostgreSQL Features I Wish I Knew Sooner marmelab.com
Like Comment
To view or add a comment, sign in
Matty Stratton
4w
Report this post
PostgreSQL has had a long-standing policy: the planner should do the right thing without hints. No escape hatches, trust the optimizer. And that mostly works...until it doesn't. Until you're staring at a query that was fine yesterday and now it's doing a sequential scan on a 200M row table because the statistics shifted. Robert Haas just committed pg_plan_advice to the PostgreSQL 19 tree. It lets you capture a working query plan and replay it later, or override specific planner decisions. Force a hash join. Insist on an index scan. All without touching application code. https://lnkd.in/gEgzEz8s There's a philosophy thing here at play. Postgres has always said "we don't do hints." This isn't hints. It's a pluggable framework that separates mechanism from policy. Plan stability without compromising the principle that the planner should get it right by default. At Tiger Data, we deal with this constantly. Time-series workloads can shift query patterns dramatically as data ages and compression kicks in. Having plan stability built into core Postgres (rather than relying on third-party extensions) makes the whole ecosystem more reliable for the kind of workloads we see every day. Haas calls it "version 1.0 technology." The door is open. What gets built on top of this is going to matter. #PostgreSQL #Postgres #TimescaleDB #TigerData #QueryOptimization

pg_plan_advice: Plan Stability and User Planner Control for PostgreSQL? rhaas.blogspot.com

4 Comments
Like Comment
To view or add a comment, sign in
Prabhat Kumar
1mo
Report this post
Recently, I re-visited a foundational concept in PostgreSQL performance tuning, and it truly reinforced the power of deeply understanding `EXPLAIN ANALYZE`. It's not enough to just run the command; interpreting its output effectively is where the real optimization magic happens. My recent "aha!" moment came while debugging a query suffering from significant latency. Initially, I suspected a missing index. However, a closer look at the `EXPLAIN ANALYZE` output revealed that while indexes were present, the planner was opting for a costly sequential scan on a large intermediate result set, followed by an inefficient sort operation. The plan clearly showed a high `cost` associated with a `Sort` node, hinting at excessive data being processed before sorting. This led me to rethink the query logic and the conditions. By carefully restructuring the query to filter earlier in the execution path and adding a more specific composite index on the frequently filtered and joined columns, the execution time plummeted from several seconds to mere milliseconds. The key wasn't just *having* an index, but guiding the planner to *use* it optimally for the specific query's needs. **Key takeaway for you:** When analyzing `EXPLAIN ANALYZE` output, don't just focus on missing indexes. Pay close attention to nodes with high `cost` values, especially `Sort`, `Hash Join`, `Merge Join`, or `Seq Scan` on large tables. These often indicate opportunities to refine your query, optimize your indexing strategy, or even restructure your data to better align with the planner's capabilities. What's your most valuable tip for interpreting `EXPLAIN ANALYZE` and boosting PostgreSQL query performance? Share your insights below! #PostgreSQL #Database #PerformanceTuning #SQL #DataEngineering #SoftwareDevelopment References: PostgreSQL Documentation - EXPLAIN - [https://lnkd.in/gKZnNc6W) Deep Dive into EXPLAIN ANALYZE - [https://lnkd.in/gjGvJbtq)
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
1mo
Report this post
How many of your PostgreSQL queries have degraded plans right now? If you are like most teams, the honest answer is: you have no idea. Most DBAs only run EXPLAIN when something is already on fire. A query creeps from 50ms to 500ms over weeks as a table grows, the planner shifts strategies, and nobody notices until it crosses a threshold and response times spike. By the time someone runs EXPLAIN during an incident, the pressure is on and there is no baseline to compare against. Here is why automated EXPLAIN plan collection changes the game: 1. Coverage you cannot get manually. With hundreds of query templates running against dozens of tables, running EXPLAIN on each one every few hours is not realistic. Automated collection covers the top 50 slowest queries every 5 minutes without human intervention. 2. Safe collection on PostgreSQL 16+. Using EXPLAIN (GENERIC_PLAN), plans are generated without executing the query. No risk of accidentally running a destructive statement or waiting for a slow one to complete. 3. Performance grading from A to F. Each collected plan is analyzed for sequential scans on large tables, hash operations spilling to disk, nested loops with high row counts, and large gaps between estimated and actual rows. A grade of D or F comes with a specific recommendation — not just "something is wrong" but "sequential scan on large table, consider adding an index." The key insight: when a query degrades, you can compare the current plan to last week's and see exactly what changed. That is the difference between a proactive fix during a routine review and a scramble during a 2am incident. Stop guessing why queries are slow. Start collecting EXPLAIN plans automatically. Full article with SQL examples: https://lnkd.in/ecfr3YTW #PostgreSQL #DatabasePerformance #DatabaseMonitoring #DevOps
Like Comment
To view or add a comment, sign in
Matty Stratton
1mo
Report this post
VACUUM FULL has been the "break glass in case of emergency" option for Postgres bloat for as long as anyone can remember. It works. It also takes an ACCESS EXCLUSIVE lock on your table while it does its thing. Hope nobody needed that data. PostgreSQL 19 is getting a REPACK command that rolls VACUUM FULL and CLUSTER into a single operation with a cleaner interface. https://lnkd.in/giMbxdHM The initial version still locks the table (same as VACUUM FULL today). That's not the exciting part. The cool part is what it enables next: concurrent repacking built into core Postgres. No extensions, no third-party tools, just a command that handles bloat without making your application wait. pg_repack has been doing the concurrent version of this for years as an extension, and it's great. But there's a real difference between "install this extension and hope your version is compatible" and "it's a SQL command that ships with Postgres." If you've ever had to schedule a maintenance window just to reclaim disk space, you know why this matters. What's the worst VACUUM FULL experience you've had in production? #PostgreSQL #DatabaseOps #DataEngineering #DevOps

Waiting for PostgreSQL 19 – Introduce the REPACK command https://www.depesz.com
Like Comment
To view or add a comment, sign in
Philip McClarence
1mo
Report this post
Find tables with the worst bloat ratio in your PostgreSQL database. SELECT schemaname, relname AS table_name, pg_size_pretty(pg_table_size(relid)) AS table_size, n_live_tup, n_dead_tup, CASE WHEN n_live_tup > 0 THEN round(100.0 * n_dead_tup / n_live_tup, 1) ELSE 0 END AS bloat_pct, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables WHERE n_live_tup > 0 ORDER BY n_dead_tup DESC LIMIT 15; Dead tuples are rows that have been updated or deleted but not yet cleaned up by VACUUM. They make your table physically larger. Every sequential scan reads through them. Every index scan might hit them. What to look for: • 𝗯𝗹𝗼𝗮𝘁_𝗽𝗰𝘁 > 𝟮𝟬% — autovacuum is falling behind. Check if it's being blocked by long transactions. • 𝗹𝗮𝘀𝘁_𝗮𝘂𝘁𝗼𝘃𝗮𝗰𝘂𝘂𝗺 𝗶𝘀 𝗡𝗨𝗟𝗟 — the table has never been vacuumed automatically. Check if autovacuum is enabled. • 𝗹𝗮𝘀𝘁_𝗮𝘂𝘁𝗼𝘃𝗮𝗰𝘂𝘂𝗺 𝘄𝗮𝘀 𝗱𝗮𝘆𝘀 𝗮𝗴𝗼 — on a busy table, this is a problem. Autovacuum should be running frequently. If your biggest table hasn't been vacuumed in days and it's getting regular writes, something is wrong. Don't wait for it to become a performance complaint. Try it on your database. Sort by n_dead_tup and see what surfaces. #PostgreSQL #Database #VACUUM #Performance #DevOps

3 Comments
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
1mo
Report this post
Find tables with the worst bloat ratio in your PostgreSQL database. SELECT schemaname, relname AS table_name, pg_size_pretty(pg_table_size(relid)) AS table_size, n_live_tup, n_dead_tup, CASE WHEN n_live_tup > 0 THEN round(100.0 * n_dead_tup / n_live_tup, 1) ELSE 0 END AS bloat_pct, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables WHERE n_live_tup > 0 ORDER BY n_dead_tup DESC LIMIT 15; Dead tuples are rows that have been updated or deleted but not yet cleaned up by VACUUM. They make your table physically larger. Every sequential scan reads through them. Every index scan might hit them. What to look for: • 𝗯𝗹𝗼𝗮𝘁_𝗽𝗰𝘁 > 𝟮𝟬% — autovacuum is falling behind. Check if it's being blocked by long transactions. • 𝗹𝗮𝘀𝘁_𝗮𝘂𝘁𝗼𝘃𝗮𝗰𝘂𝘂𝗺 𝗶𝘀 𝗡𝗨𝗟𝗟 — the table has never been vacuumed automatically. Check if autovacuum is enabled. • 𝗹𝗮𝘀𝘁_𝗮𝘂𝘁𝗼𝘃𝗮𝗰𝘂𝘂𝗺 𝘄𝗮𝘀 𝗱𝗮𝘆𝘀 𝗮𝗴𝗼 — on a busy table, this is a problem. Autovacuum should be running frequently. If your biggest table hasn't been vacuumed in days and it's getting regular writes, something is wrong. Don't wait for it to become a performance complaint. Try it on your database. Sort by n_dead_tup and see what surfaces. #PostgreSQL #Database #VACUUM #Performance #DevOps
Like Comment
To view or add a comment, sign in