PostgreSQL Database Bloat: Understanding MVCC and Auto-Vacuum

2mo

In PostgreSQL, running a massive DELETE command does not immediately free up disk space. In fact, an UPDATE is just an INSERT in disguise. Understanding MVCC is the key to solving database bloat. Here is how it works. Postgres uses Multi-Version Concurrency Control. The goal is to allow readers to access data without being blocked by writers modification of the same rows. To achieve this, Postgres never modifies data in place. When you run an UPDATE, the engine actually performs two distinct actions. 1. It performs an INSERT of the new version of the row. 2. It marks the old version as a 'dead tuple' (essentially a soft delete). The old data is still sitting physically on the disk. It is just invisible to new transactions. This leads to a specific problem. If you have a write-heavy application, these dead tuples pile up fast. This is table bloat. Your queries get slower because the database engine has to scan through all the junk to find the live data. This is why the Auto-Vacuum process is mandatory. It is the janitor. It runs in the background to clean up these dead tuples and mark that storage space as reusable for future inserts. If your storage usage seems high after a cleanup, don't panic. Check if your vacuum process is keeping up. #Database #SoftwareEngineering #DataEngineering #PostgreSQL #SQL #DatabasePerformance #MVCC #SystemDesign

To view or add a comment, sign in

More Relevant Posts

MyDBA.dev

33 followers
1mo
Report this post
You just ran pg_upgrade from PostgreSQL 15 to 16. Everything looks clean. The databases are up, applications reconnect, queries execute. Then someone runs a spatial query and gets: "could not access file: No such file or directory." pg_upgrade copies catalog metadata faithfully. Too faithfully. It migrates the extension entries that say "PostGIS 3.4.0 is installed" without checking whether PostGIS 3.4.0 actually exists in the new cluster's library path. The old .so files were compiled against PG 15. They are gone or incompatible. Every C-implemented PostGIS function -- ST_Contains, ST_Intersects, ST_Buffer, ST_DWithin -- breaks on the first call. The treacherous part: not everything breaks. Pure SQL wrapper functions and metadata functions like PostGIS_Full_Version() still work. A quick smoke test passes. The application loads. It is only when a user triggers a real spatial query that the error surfaces. Even when you install the correct PostGIS packages for PG 16, there is a second problem. The catalog still says "3.4.0 installed" while the library on disk is 3.5.0. PostGIS functions work, but you are running with outdated SQL definitions, missing new functions, and potentially incompatible behavior between the library and catalog versions. The fix after every pg_upgrade: 1. Install PostGIS packages matching the new PG major version BEFORE upgrading 2. Run ALTER EXTENSION postgis UPDATE in EVERY database 3. Verify with PostGIS_Full_Version() -- SQL and library versions must match 4. Test a C function: SELECT ST_AsText(ST_MakePoint(0, 0)) The full detection queries, fix procedures, and a scripted post-upgrade checklist: https://lnkd.in/e8Zzrt4p #PostgreSQL #PostGIS #DatabaseUpgrade #DBA
Like Comment
To view or add a comment, sign in
Database Spa

91 followers
2mo
Report this post
PostgreSQL micro blog: Don’t ignore table bloat, it silently kills performance. In PostgreSQL, bloat happens when dead row versions from UPDATE/DELETE operations accumulate on disk. Over time this increases table size, bloats indexes, and slows queries and backups. A few quick signals to watch for: • Rising disk usage without corresponding data growth • Slow scans on tables that used to be fast • Autovacuum not keeping up with churn How to tackle it: • Run regular VACUUM and tune autovacuum thresholds per table • Use tools like pgstattuple to measure actual dead tuples • Reclaim space with smarter tools (pg_repack, VACUUM FULL, or pg_squeeze ) when needed • Consider partitioning large, high-churn tables as part of long-term strategy () Bloat isn’t a bug, it’s a natural side effect of MVCC. But proactive maintenance keeps Postgres fast and storage efficient. #PostgreSQL #DatabasePerformance #DBA #PerformanceTuning #DatabaseSpa
Like Comment
To view or add a comment, sign in
Shreyan Technology LLC

65 followers
1mo
Report this post
PostgreSQL DB - List all Databases To list out all the existing databases in PostgreSQL: 1. Execute the SQL query from the system Catalog. SELECT datname FROM pg_database; 2. The psql program's \l meta-command * \l or \list - Lists all databases, their owners, and character encodings in the psql interface. * \l+ or \list+ - Provides more detailed information, including database size and default tablespace. * -l command-line option are also useful for listing the existing databases. (Ex: $ psql -l ) 3. \c or \conninfo - displays the currently connected database. Watch Video Format: https://lnkd.in/eYSdHTkS

List Databases in PostgreSQL

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Philip McClarence
1mo
Report this post
You don't need to guess which index to add. PostgreSQL is already tracking everything you need. `pg_stat_user_indexes` tells you exactly how your indexes are being used. Here are the columns that matter: 𝗶𝗱𝘅_𝘀𝗰𝗮𝗻 -- How many times this index has been used for a scan. Zero means it's never been used since the last stats reset. You're paying write overhead and disk space for nothing. 𝗶𝗱𝘅_𝘁𝘂𝗽_𝗿𝗲𝗮𝗱 -- How many index entries have been returned by scans. High reads with low scans means each scan is reading a lot of the index (possibly a sign of poor selectivity). 𝗶𝗱𝘅_𝘁𝘂𝗽_𝗳𝗲𝘁𝗰𝗵 -- How many live table rows were fetched using this index. Compare with idx_tup_read -- a big gap means many dead or invisible tuples. Quick query to find unused indexes wasting space: SELECT indexrelname AS index_name, relname AS table_name, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC LIMIT 10; I've seen databases carrying 100's of GB indexes with zero scans. Run this query. You'll probably find at least one. #PostgreSQL #Database #SQL #DevOps #IndexOptimization

1 Comment
Like Comment
To view or add a comment, sign in
Tachegnon Christian Kpanou
1mo
Report this post
🚀 One of the most underrated SQL features: LEFT JOIN LATERAL Many developers know JOIN. Far fewer know LATERAL. But when working with high-performance queries, LATERAL can be a game changer — especially in databases like PostgreSQL. Here’s the idea: A normal JOIN combines tables using a fixed condition. LATERAL allows a subquery to use values from the current row of the left table. In other words, the database can run a query per row. Example: get the latest order per user. Instead of joining all orders and sorting a massive dataset: SELECT u.id, o.id FROM users u LEFT JOIN LATERAL ( SELECT id FROM orders WHERE orders.user_id = u.id ORDER BY created_at DESC LIMIT 1 ) o ON true; With the right index, the database performs an efficient lookup per user instead of scanning millions of rows. This pattern is extremely useful for: • Eliminating N+1 queries • Fetching top-N results per group • Building efficient API responses • Aggregating JSON per parent record The key lesson: Scaling systems isn’t only about infrastructure. Sometimes the biggest performance gains come from using the database more intelligently. Great engineers don’t just write queries. They understand how the database executes them. #SoftwareEngineering #PostgreSQL #BackendDevelopment #DatabaseOptimization #SystemDesign
Like Comment
To view or add a comment, sign in
saikrishna gurnule
2mo
Report this post
Constraints are not just validation rules. They directly affect indexing, storage behavior, locking, and performance. Here’s what the image highlights 👇 🟥 PRIMARY KEY SQL Server Creates a Unique Index Default = Clustered Index Leaf level stores actual table data Physically orders rows PostgreSQL Creates a Unique B-Tree Index Table remains a Heap Index and table are separate structures No physical reordering of rows 🟥 UNIQUE Constraint SQL Server Allows only one NULL Enforced via unique index PostgreSQL Allows multiple NULLs Because NULL ≠ NULL (unknown comparison) Understanding this difference is critical in schema design. 🟥 FOREIGN KEY Enforcement Neither engine automatically creates an index on the foreign key column. During DELETE or UPDATE on parent: Engine must check child table. Without index → Full table scan. Can cause locking and performance issues. Best Practice: ✔ Always index foreign key columns. 🟥 CHECK Constraints SQL Server Immediate validation during INSERT/UPDATE PostgreSQL Supports DEFERRABLE INITIALLY DEFERRED Validation can happen at COMMIT time This gives PostgreSQL more flexibility for complex transactions. 🎯 Key Takeaways ✔ Primary Key creates index internally ✔ Clustered vs Heap storage differs ✔ UNIQUE NULL behavior is different ✔ Foreign keys must be indexed manually ✔ Constraints directly impact locking & performance Constraints are not just about data integrity. They influence how the optimizer works and how your system scales. #SQLServer #PostgreSQL #DatabaseInternals #Constraints #BackendEngineering #PerformanceTuning #DatabaseDesign
Like Comment
To view or add a comment, sign in
Franck Pachot
1mo
Report this post
You think your data is relational because you already visualize it in a normalized form. But most data starts as a document — a form, a message, an event. 🌱 If you store the document as-is, consistency is natural. You don't need foreign keys, and you rarely need multi-statement transactions. 🧩 If you normalize it, you must recreate that consistency with foreign keys, joins, and transactions. 💡 The real question for your OLTP database isn't #NoSQL vs #SQL, or #PostgreSQL vs MongoDB. It's whether you store the application's data aligned with its domain model — or normalize it to make it application-agnostic.
14 Comments
Like Comment
To view or add a comment, sign in
Jatin Rathod
1mo
Report this post
🚀 PostgreSQL Extensions Every DBA Should Know (But Many Don’t Use) PostgreSQL is powerful out of the box. But the real magic? ✅ Extensions They can turn PostgreSQL into a : • Monitoring system • Time-series database • Query analysis tool • Scalable data engine And most teams barely use them properly. 🔧 Here are PostgreSQL extensions every DBA should know: 1️⃣ pg_stat_statements Tracks real query performance across your database. 2️⃣ PostGIS Adds geospatial capabilities for location-based queries. 3️⃣ TimescaleDB Optimized for time-series workloads like logs and metrics. 4️⃣ pg_partman Automates table partitioning for large datasets. 5️⃣ pg_trgm Improves fuzzy search and text similarity queries. 🧠 Real-World Scenario (What Most DBAs Actually Face) A common situation in production: An application table grows steadily over time (10M → 20M → 50M rows) Read queries start slowing down. Autovacuum struggles to keep up. Index bloat increases. CPU and I/O usage spike during peak hours. No sudden failure. Just gradual performance degradation. What typically works in this situation: ✔️ Use pg_stat_statements to identify high-frequency and slow queries ✔️ Introduce partitioning using pg_partman to split large tables into manageable chunks ✔️ Rebuild or optimize indexes based on actual query patterns ✔️ (Optional) Move time-based data to TimescaleDB if workload is heavily time-series 💡 Outcome (What DBAs usually observe) More predictable query performance. Reduced index and table scan overhead. Better autovacuum efficiency. Improved stability under load. Not magic Just good engineering. ⚡ Takeaway Most PostgreSQL performance issues aren’t because PostgreSQL is weak. They happen because: ⚠️ We don’t use the tools PostgreSQL already gives us. Extensions are part of the system not an add-on. If you're working with PostgreSQL Which extension has saved you the most in production? Comment below 👇🏻 #PostgreSQL #DBA #DatabasePerformance #DataEngineering #Backend #OpenSource #Tech
2 Comments
Like Comment
To view or add a comment, sign in
Emmanuel Kithusi
1mo
Report this post
SQL – Day 4 Yesterday I was reminding myself what people mean when they say SQL is a standardized language. Essentially, it means that despite the existence of different relational database management systems, there is a common set of rules and syntax that governs how SQL is written and used. Over the past few days, I’ve been revisiting SQL concepts using PostgreSQL, brushing up on fundamentals I learned last year. Yesterday I decided to try the same queries in MySQL. The experience was interesting — the flow was almost the same. Even though there are small differences between the systems, they were minimal for the concepts I was practicing. 📌 Takeaway: Once you understand the core logic of SQL, moving between database systems becomes much easier. The fundamentals remain largely the same — it’s mostly the environment and a few syntax details that change. Still sharpening the fundamentals. One query at a time. 🚀 #SQL #PostgreSQL #MySQL #DataScienceJourney #LearningInPublic
Like Comment
To view or add a comment, sign in

4,153 followers

View Profile Follow

PostgreSQL Database Bloat: Understanding MVCC and Auto-Vacuum

More from this author

Remote Training for Android, Java and Web Development

Share Notes - Engineering - Discussion Forum

Share Notes - Engineering - Discussion Forum

Explore content categories

PostgreSQL Database Bloat: Understanding MVCC and Auto-Vacuum

More Relevant Posts

List Databases in PostgreSQL

https://www.youtube.com/

More from this author

Remote Training for Android, Java and Web Development

Share Notes - Engineering - Discussion Forum

Share Notes - Engineering - Discussion Forum

Explore related topics

Explore content categories