How MVCC Works in PostgreSQL Databases

How MVCC actually works in databases like PostgreSQL (Multi-Version Concurrency Control) When multiple users read and update the same data at the same time, databases need to maintain consistency without slowing everything down. Earlier systems used locking: Write → others wait Read → may block writes This works, but doesn’t scale well under load. What’s actually happening under MVCC Every transaction is assigned a transaction ID (TXID). When a query starts, it sees a snapshot based on visible TXIDs. Each row internally stores metadata like: created by TXID deleted/updated by TXID So visibility is not “current value” - it’s: “is this version valid for my transaction?” How reads work The database scans rows. For each row, it checks: → Was this version committed before my transaction started? If yes → visible If not → ignored So reads are basically filtering versions, not locking data. How writes work- An update does NOT modify the row in place. It creates a new row version with a new TXID. The old version is marked as expired (but not deleted yet). This is why: readers continue unaffected writers don’t wait for readers Isolation levels matter here MVCC behavior changes slightly depending on isolation level: Read Committed → sees latest committed version per query Repeatable Read → sees same snapshot for entire transaction Serializable → adds extra checks to avoid anomalies So “what we see” depends on isolation, not just MVCC itself. The hidden cost Because updates create new versions: Table bloat can happen if cleanup is slow Indexes also grow with multiple versions Long-running transactions delay cleanup Cleanup (e.g., VACUUM in Postgres) is critical - without it, performance degrades over time. Pros: No read/write blocking, Predictable reads, High concurrency Cons: More storage usage, Vacuum tuning required, Complex internals, Stale reads depending on isolation Reality: MVCC is a trade-off: we spend more storage + background work to get less waiting + better throughput That trade-off is why most modern relational databases use it. Article: https://lnkd.in/gVArBVgf #BackendEngineering #Database #PostgreSQL #MVCC

  • text, whiteboard

To view or add a comment, sign in

Explore content categories