How MVCC Works in PostgreSQL Databases

2w Edited

How MVCC actually works in databases like PostgreSQL (Multi-Version Concurrency Control) When multiple users read and update the same data at the same time, databases need to maintain consistency without slowing everything down. Earlier systems used locking: Write → others wait Read → may block writes This works, but doesn’t scale well under load. What’s actually happening under MVCC Every transaction is assigned a transaction ID (TXID). When a query starts, it sees a snapshot based on visible TXIDs. Each row internally stores metadata like: created by TXID deleted/updated by TXID So visibility is not “current value” - it’s: “is this version valid for my transaction?” How reads work The database scans rows. For each row, it checks: → Was this version committed before my transaction started? If yes → visible If not → ignored So reads are basically filtering versions, not locking data. How writes work- An update does NOT modify the row in place. It creates a new row version with a new TXID. The old version is marked as expired (but not deleted yet). This is why: readers continue unaffected writers don’t wait for readers Isolation levels matter here MVCC behavior changes slightly depending on isolation level: Read Committed → sees latest committed version per query Repeatable Read → sees same snapshot for entire transaction Serializable → adds extra checks to avoid anomalies So “what we see” depends on isolation, not just MVCC itself. The hidden cost Because updates create new versions: Table bloat can happen if cleanup is slow Indexes also grow with multiple versions Long-running transactions delay cleanup Cleanup (e.g., VACUUM in Postgres) is critical - without it, performance degrades over time. Pros: No read/write blocking, Predictable reads, High concurrency Cons: More storage usage, Vacuum tuning required, Complex internals, Stale reads depending on isolation Reality: MVCC is a trade-off: we spend more storage + background work to get less waiting + better throughput That trade-off is why most modern relational databases use it. Article: https://lnkd.in/gVArBVgf #BackendEngineering #Database #PostgreSQL #MVCC

To view or add a comment, sign in

More Relevant Posts

Ameya Kasture
1w Edited
Report this post
Moving to a new database sounds easy, right? I thought the same—until I dove deep into the challenges and hidden pitfalls. It's way harder than just copying files. You need to extract millions of rows without slowing down production. You need to make sure nothing gets lost or duplicated. And you need the whole thing to work seamlessly with a completely different database engine. Researching the problem space and exploring YugabyteDB’s solution has been both insightful and inspiring, showcasing a powerful approach to solving complex distributed database challenges. Heres how Yugabyte 𝗩𝗼𝘆𝗮𝗴𝗲𝗿 solves the crucial issue—but here's the clever part: instead of reinventing the wheel, it leverages PostgreSQL's existing tools. The architecture is simple: Voyager runs on its own machine (not on your database servers), connects over the network, and orchestrates the whole migration without risking your production system. This "𝘂𝘀𝗲 𝘁𝗵𝗲 𝗿𝗶𝗴𝗵𝘁 𝘁𝗼𝗼𝗹 𝗳𝗼𝗿 𝗲𝗮𝗰𝗵 𝗷𝗼𝗯" philosophy is worth studying, regardless of whether you're moving databases or building distributed systems. The result? A migration engine that's both 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝗮𝗻𝗱 𝗽𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝗰. For a deeper understanding and more insights, check out my full article : https://lnkd.in/gEpbVA6B

Beyond pg_dump: How YugabyteDB Voyager Turns Familiar Tools into a Migration Engine medium.com
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
3w
Report this post
A PostgreSQL table with 500 million rows doesn't just slow down queries. It slows down everything: VACUUM takes hours, index builds lock the table, and deleting old data generates massive WAL and leaves bloat behind. Table partitioning splits a large table into smaller physical pieces while keeping it as a single logical table. The query planner uses partition pruning to scan only the relevant partitions. Here's what matters in practice: 1. The partition key must appear in your most common WHERE clauses. If 90% of your queries filter on event_timestamp, partition by timestamp. A partition key that queries don't filter on provides zero benefit and only adds planning overhead. Always verify with EXPLAIN before committing to a scheme -- if pruning doesn't activate, the partitioning isn't helping. 2. Partition count matters more than people think. Monthly partitions for 5 years = 60 partitions (reasonable). Daily partitions for 5 years = 1,825 partitions (the planner slows down noticeably). The planner evaluates each partition during query planning. Keep it manageable, or use TimescaleDB which is specifically optimized for high partition counts. 3. The biggest operational win is instant data removal. DROP TABLE on a partition takes milliseconds and generates no WAL. Compare that to DELETE FROM events WHERE event_timestamp < '2024-02-01' on 100 million rows -- that takes an hour, generates massive WAL, and leaves dead tuples for VACUUM to clean up. The biggest gotcha: you can't convert an existing table to a partitioned table in place. You need to create a new partitioned table, migrate data in batches, and swap with a rename in a single transaction. For zero-downtime migrations, add a trigger or logical replication to capture writes during the migration. Automate partition lifecycle with pg_partman or cron. A missing future partition causes INSERT failures. A forgotten old partition wastes storage. Full guide with range, list, hash strategies, migration patterns, and pg_partman setup: https://lnkd.in/eGUj8zXC #PostgreSQL #TablePartitioning #DatabasePerformance #DataEngineering #DevOps #SRE
Like Comment
To view or add a comment, sign in
Puneet khanna
2w
Report this post
I recently wrote about how PostgreSQL actually works under the hood — covering MVCC, WAL, buffer cache, and checkpoints. If you’ve ever wondered how databases handle concurrency with and without locking everything, this might help. Would love feedback from folks who’ve worked on database systems 👇 https://lnkd.in/gRfS3cJZ #postgresql #systemdesign #databases #backend #engineering

Database Internals ( Notes) connect2pkhanna.medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Sahil Sarwar
3w
Report this post
PostgreSQL doesn't update rows. It creates new ones. When we run this: UPDATE users SET balance = balance + 100 WHERE id = 42; We probably imagine (at least I did initially) that the database finds row 42 and updates it in place. That's not what happens. Postgres uses MVCC (Multiversion Concurrency Control). Instead of modifying the existing row, it writes a brand new version and marks the old one as dead. This is what allows concurrent reads and writes without locking rows. The table now looks something like: id | balance ------------ 42 | 100 -> dead 42 | 200 -> dead 42 | 300 -> live These dead rows don't disappear on their own. They sit on disk, taking up space, and more importantly, every sequential scan has to step over them. At 1 update/sec, that's 86,400 dead tuples every day from a single row. At scale, you end up with something like 10M live rows buried under 40M dead ones. Every query pays the cost. Enter VACUUM (Postgres's garbage collector) It scans tables, marks dead tuples as reclaimable space, and keeps bloat from becoming out of control. autovacuum does this automatically; it gets triggered when enough rows have changed, based on the formula - rows_modified > threshold + scale_factor × total_rows Default values: threshold = 50, scale_factor = 0.1 So for a 1M row table, autovacuum kicks in after ~100,050 changes, which is reasonable. autovacuum also runs ANALYZE, which updates the statistics the query planner uses to pick execution plans. Column distributions, row counts, value frequencies. Without fresh stats, the planner guesses. Bad guesses mean bad plans. Bad plans mean bad query performance. The trigger formula has a subtle implication: on a brand new table with 0 rows, autovacuum fires almost immediately. This became part of a pretty interesting incident investigation. More on that soon.
6 Comments
Like Comment
To view or add a comment, sign in
Jai Arora
2w
Report this post
🚀 How Data is Actually Stored in PostgreSQL Let’s break it down 👇 🔍 1. Data is Stored in Pages (Blocks) PostgreSQL doesn’t store rows randomly. 👉 Everything is stored in fixed-size pages (8KB by default) • Each table = collection of pages • Pages are the smallest unit of I/O 📦 2. Inside a Page (What’s Really There) Each page contains: • Page Header → metadata • Item Pointers (Line Pointers) → offsets to rows • Actual Rows (Tuples) 👉 Rows are not stored sequentially 🧠 3. What is a Tuple? In PostgreSQL, a row = tuple Each tuple contains: • Actual column data • Transaction IDs (xmin, xmax) • Visibility info (for MVCC) 👉 This is how PostgreSQL handles concurrency 🔄 4. MVCC (Why Multiple Versions Exist) Instead of updating rows in place: • PostgreSQL creates a new version of the row • Old version remains until cleaned 👉 Enables: • No read/write blocking • High concurrency 🧹 5. Dead Tuples & VACUUM When rows are updated/deleted: • Old versions become dead tuples 👉 VACUUM process: • Cleans them • Frees space • Prevents table bloat 📂 6. Table Storage (Heap Structure) 👉 PostgreSQL uses heap storage • No guaranteed order of rows • New rows go into available space 💡 That’s why indexing is critical for fast lookup 🌳 7. Index Storage (Separate Structure) Indexes are stored separately: • Usually B-Trees • Store: value → pointer to tuple 👉 Query uses index → then fetches actual row Have you ever debugged a slow query using this knowledge? #PostgreSQL #Databases #SystemDesign #BackendEngineering #Performance #Scalability
Like Comment
To view or add a comment, sign in
Serhii Pentsov
4w
Report this post
Choosing a database involves more than just the SQL vs NoSQL debate. It's essential to consider: - How your system fails - How much data you can afford to lose - How long you can remain down - Who will manage the recovery Many recommendations overlook these critical factors. I have created a practical breakdown based on real constraints rather than theoretical discussions. Read more here: https://lnkd.in/gHVKPijc

How to Choose a Database Without Regretting It nobullshitengineering.substack.com

1 Comment
Like Comment
To view or add a comment, sign in
A. M. Sadman Rafid Sourav
3w
Report this post
🚀 Optimizing PostgreSQL for Large-Scale Systems: Indexing Strategies That Actually Matter If your PostgreSQL database is handling millions (or billions) of rows, indexing decisions are no longer a “nice-to-have”—they directly impact query latency and user experience. Here’s a quick breakdown of what I’ve learned about indexing at scale: 🔹 B-Tree vs GIN vs BRIN ▶️ B-Tree: Great for equality and range queries on well-distributed data. The default choice for most workloads. ▶️ GIN: Optimized for full-text search or array/JSONB containment queries. Perfect when you need fast lookups on complex structures. ▶️ BRIN: Lightweight, space-efficient, and ideal for huge tables with naturally ordered data (e.g., timestamps). 🔹 Composite Indexes & Query Patterns Building a composite index isn’t just “add more columns.” The column order should reflect your query’s filtering and sorting patterns. Misaligned indexes can easily be ignored by the planner. 🔹 Indexing JSONB Fields JSONB is flexible but can be slow if queried naively. Use GIN indexes for @> containment queries. Use expression indexes if you frequently filter on a nested property. 🔹 Query Planner Insights with EXPLAIN ANALYZE Always validate your assumptions. EXPLAIN ANALYZE doesn’t lie. It shows exactly how the planner executes a query and which indexes it chooses. A slow query often tells a story about a missing or misused index. 💡 The takeaway: At scale, indexing decisions aren’t just a tweak—they can mean the difference between sub-second responses and multi-second waits. Understand your data, your queries, and let the planner guide your indexing strategy. Have you ever seen a 100ms query drop to 10ms just by rethinking indexes? Postgres magic. ✨ #PostgreSQL #DatabaseOptimization #Indexing #PerformanceTuning #DataEngineering
1 Comment
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
2w
Report this post
Every standard B-tree index lookup in PostgreSQL is a two-step process: scan the index to find row pointers, then fetch the actual row data from the heap table. That second step -- the heap fetch -- is the bottleneck. For queries returning many rows, each heap fetch is a random I/O operation scattered across the table. A covering index eliminates the heap entirely. If all the columns your query needs exist in the index, PostgreSQL reads everything from the compact, ordered index structure. No random heap access. No wasted I/O. Three things most teams miss about covering indexes: **1. INCLUDE is different from a composite index.** Before PostgreSQL 11, you had to create a composite index on all columns: `(customer_id, customer_name, customer_email)`. This sorts on all three columns even though nobody searches by customer_name. The `INCLUDE` clause adds columns to leaf pages without including them in the sort key: `CREATE INDEX ON customers (customer_id) INCLUDE (customer_name, customer_email)`. Smaller index, cleaner semantics. **2. Index-only scans depend on vacuum.** PostgreSQL can skip the heap only for pages marked "all-visible" in the visibility map. If vacuum falls behind, pages are not marked all-visible, and the planner falls back to regular index scans with heap fetches -- even with a covering index. A covering index without healthy vacuum is a wasted investment. **3. The performance gap widens dramatically with scale.** On a 100-million-row table, eliminating heap fetches can reduce query time from hundreds of milliseconds to single-digit milliseconds -- a 10-100x improvement. The larger the table and the more rows your query returns, the bigger the win. Look for `Index Scan` with high `Heap Fetches` in your EXPLAIN output. If you see `Heap Fetches: 1000` where an `Index Only Scan` with `Heap Fetches: 0` is possible, there is a covering index opportunity waiting. Practical guide with INCLUDE vs composite examples, dashboard query patterns, and vacuum considerations: https://lnkd.in/eFhhvyU6 #PostgreSQL #DatabasePerformance #Indexing #CoveringIndex #SoftwareEngineering #DevOps
Like Comment
To view or add a comment, sign in
Ezra Erb
3d
Report this post
Postgres now has the ability to export statistics about a database. Here's a great post about how to use this ability to debug and optimize production database queries in test databases: https://lnkd.in/eMptU8J7

Production query plans without production data boringsql.com
Like Comment
To view or add a comment, sign in
Ali Hamza Ansari
1w
Report this post
Boost PostgreSQL performance with 12 expert tips for large datasets from partitioning to partial indexes and avoiding bloat. Stay fast and scale. https://lnkd.in/dvibkzSy #PostgreSQL #SoftwareDevelopment #Performance

12 practices for optimizing PostgreSQL queries for large datasets blog.elmah.io
Like Comment
To view or add a comment, sign in

3,733 followers

11 Posts

View Profile Follow

How MVCC Works in PostgreSQL Databases

More Relevant Posts

Explore related topics

Explore content categories