Postgres DELETE doesn't remove from disk, VACUUM does

Deleting a row in Postgres doesn't remove it from disk. It marks it invisible and leaves the rest to VACUUM. Postgres just sets a single field on the tuple header and moves on. Here's what DELETE actually does: 1. Postgres locates the target tuple on its heap page. 2. It then sets xmax on that tuple to the current transaction ID. 3. That's it, the tuple stays on the page, unchanged, in place. 4. Future transactions check xmax during visibility evaluation. If it's set to a committed transaction, the row is invisible to them. 5. The space the tuple occupies is not freed, it remains on the page. 6. VACUUM is the only process that can reclaim that space and only after no active transaction can still see the deleted version. A table with heavy deletes doesn't shrink. It accumulates dead tuples which are invisible rows that still get read during sequential scans. A 10 million row table where 8 million rows were deleted is not a 2 million row table in terms of storage or scan cost. A DELETE tells Postgres a row is gone. VACUUM is what makes that true on disk. Most of us treat DELETE as an immediate removal, how long did it take you to unlearn that assumption in Postgres? #Postgres #PostgreSQL #BackendEngineering

  • diagram

Postgres can also reclaim dead tuples on a page through HOT (Heap-Only Tuples) pruning — so even a SELECT can update pages without waiting for vacuum.

To view or add a comment, sign in

Explore content categories