PostgreSQL Checkpoints Cause Performance Spikes

The database ran fine until the checkpoint hit. A team reached out because their PostgreSQL queries would slow down at predictable intervals. Not random spikes. A pattern. Fast, then slow, then fast again. Like a heartbeat they could not explain. The culprit was something most teams never touch. Checkpoints. PostgreSQL uses checkpoints to ensure data consistency. After every checkpoint, the database writes the full content of each modified page to the write-ahead log. These are called Full-Page Image Writes, and they create massive I/O spikes immediately after every checkpoint cycle. Under a steady workload, you get a saw-tooth performance pattern. Queries are fast coming out of a checkpoint, then progressively degrade as the next one builds, then spike again when it fires. Here is what makes this tricky. Default checkpoint settings are designed to be safe and generic. They are not designed for production workloads. Most teams deploy PostgreSQL, confirm it works, and never revisit those settings. The fix is not complicated. Tuning checkpoint timing and spacing evenly distributes I/O load, eliminates the sawtooth pattern, and significantly reduces WAL overhead. The performance gains are immediate and measurable. Think of it like a water heater that cycles on and off. Every time it kicks on, it draws a surge of energy. A steady, modulated system uses less energy and delivers consistent output. Here is what our customers tell us. The performance problems they thought were hardware limitations were actually configuration defaults nobody questioned. Have you ever traced a recurring performance issue back to a setting you assumed was already optimized? #PostgreSQL #DatabasePerformance #QueryOptimization #DatabaseTuning #FortifiedData

1 Comment

Philip Johnston, PhD 🌍 3w

Beware the Untuned Checkpoint! "After every checkpoint, the database writes the full content of each modified page to the write-ahead log. These are called Full-Page Image Writes, and they create massive I/O spikes immediately after every checkpoint cycle." 👀

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Saqlain Abbas
3w
Report this post
I was recently revisiting how we store 𝐭𝐢𝐦𝐞 in PostgreSQL, and one small decision made a bigger difference than expected: TIMESTAMP vs TIMESTAMPTZ At first glance, they look similar. But in reality, they behave very differently. 𝐊𝐞𝐲 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞: TIMESTAMP → stores date and time 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐭𝐢𝐦𝐞𝐳𝐨𝐧𝐞 TIMESTAMPTZ → stores time in 𝐔𝐓𝐂 and 𝐜𝐨𝐧𝐯𝐞𝐫𝐭𝐬 𝐛𝐚𝐬𝐞𝐝 𝐨𝐧 𝐭𝐢𝐦𝐞𝐳𝐨𝐧𝐞 The issue starts when your system is not local anymore. As soon as you have: - Users in different time zones - Distributed systems - Services running across regions TIMESTAMP becomes risky. You lose context of “when” something actually happened. That is why I now default to: created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐥𝐞𝐬𝐬𝐨𝐧𝐬: - Always store time in UTC at the database level. Conversions should happen at the application/UI layer. - Timezone bugs are subtle and expensive. They usually appear late, especially in reporting and analytics. - Consistency matters more than convenience. Mixing TIMESTAMP and TIMESTAMPTZ leads to confusion over time. This is a small decision, but in distributed systems, small decisions compound. Curious how others handle time in their systems. Do you standardise on UTC everywhere, or handle it differently?

2 Comments
Like Comment
To view or add a comment, sign in
Ameya Kasture
3w Edited
Report this post
Ever used 𝙋𝙤𝙨𝙩𝙜𝙧𝙚𝙎𝙌𝙇 and just assumed it's... one thing? Install it, connect, write queries — done. That's what I thought too!! This week, while working on database migration service at Yugabyte, I pulled on a thread that unraveled something I'd taken for granted for years. PostgreSQL isn't a single program — it's a whole client-server system hiding in plain sight. • There's a server process that holds your data. • There are standalone client tools (psql, pg_dump, pg_restore) that are entirely separate binaries — they don't even need the server installed on the same machine. • And here's the part that really got me: PostgreSQL uses its own tables to keep track of all your tables. It's turtles all the way down. Once you see this clearly, questions you never thought to ask suddenly have fascinating answers: ➡️Why can pg_dump export a database running on a server across the internet? ➡️ Why do migration tools need client tools at specific versions but never need the PostgreSQL server? ➡️ pg_dump and psql speak the exact same protocol to the same server — so what actually makes them different? and many more..... I wrote a detailed blog walking through all of this — https://lnkd.in/gca4jwDY If you've only ever touched PostgreSQL through an ORM or a connection string, this might change how you think about what's actually running behind your queries.
2 Comments
Like Comment
To view or add a comment, sign in
Mark Click
6d
Report this post
5 performance settings that actually move the needle. Most database tuning conversations get lost in dozens of knobs. The reality is that a small handful of settings drive the majority of real-world performance outcomes. Wayne Leutwyler at Percona made this point clearly in February, and what we see in production matches it. 1. Buffer pool size. If your working set does not fit in memory, no amount of query tuning will save you. For MySQL, that is innodb_buffer_pool_size. For PostgreSQL, shared_buffers. Get this wrong, and everything else is noise. 2. Redo log size. Too small, and the database checkpoints constantly under write pressure. Too large, and recovery times grow. Sized properly, write-heavy workloads get smoother and steadier. 3. Connection limits and pooling. Raising max_connections to 5,000 does not make the database run faster. It gives you a slower one. EDB benchmarks on enterprise hardware showed peak performance between 300 and 500 concurrent connections, with sharp degradation past 700. A pooler like PgBouncer can serve thousands of clients from 30 to 50 real backend connections. 4. Flush method. How the database writes to disk and whether it bypasses the OS cache changes I/O behavior more than most teams realize. This is one to test, not assume. 5. Thread and cache sizing. Small numbers that quietly tax every connection. Wrong values turn into latency spikes nobody can explain. At Fortified Data, this is the work that turns a database from a slow tax line into a stable foundation. The wins are rarely glamorous. They are quietly responsible for keeping the business running. What is one tuning change that delivered a bigger performance gain than your team expected? #DatabasePerformance #MySQL #PostgreSQL #DBA #FortifiedData
Like Comment
To view or add a comment, sign in
Ritul Jain
4w
Report this post
What if thousands of UPDATEs and DELETEs are hitting your table every second? Your data looks correct. But your table size keeps growing. 📈 This is called Table Bloat. And it's a direct cost of MVCC. Here's why it happens: → PostgreSQL never overwrites a row on UPDATE or DELETE → It always creates a new version of the row instead → The old version stays behind as a dead tuple → At thousands of writes per second, dead tuples pile up fast The result? → Tables grow even when actual data hasn't changed → Queries slow down scanning through dead tuples → Indexes keep pointing to rows that no longer exist So what's the solution? VACUUM. 🧹 What VACUUM does: → Scans the table for dead tuples → Removes them and marks space as reusable → Updates the visibility map so queries stay fast → Prevents transaction ID wraparound — ignore this and PostgreSQL will shut itself down 🚨 One thing VACUUM does NOT do: → It does not shrink the file size on disk → For that you need VACUUM FULL — but it locks the table, use carefully And the best part? You don't have to run it manually. PostgreSQL's autovacuum does this in the background automatically. But autovacuum isn't magic. On high-write tables it can fall behind — tuning it for your workload is where the real DBA work begins. MVCC gave PostgreSQL speed and clean isolation. VACUUM is what keeps that trade-off from breaking you. 💡 #PostgreSQL #Database #DBA #VACUUM #TableBloat #DataEngineering
Like Comment
To view or add a comment, sign in
Mohsin Mahmood
3w
Report this post
🚨 A subtle PostgreSQL timezone bug that can break your data While working with PostgreSQL, I ran into a common but tricky issue with timezone handling — and it’s easy to miss. 🔍 Key Insight ➡️ If your column is TIMESTAMP WITH TIME ZONE (timestamptz) → it’s already stored in UTC ➡️ If it’s TIMESTAMP WITHOUT TIME ZONE → you must explicitly treat it as UTC ⚠️ The Common Mistake Applying timezone conversion twice can shift your data incorrectly, leading to wrong results without obvious errors. ✅ Best Practice ➡️ Use single timezone conversion in most cases ➡️ Only apply double conversion if your data is stored as timestamp without timezone in UTC ➡️ Prefer range-based filtering over date casting for better accuracy and performance 🎯 Takeaway If you’re unsure about your column type, it’s most likely timestamptz — so keep it simple and avoid over-conversion. Small detail, big impact. #PostgreSQL #SQL #BackendDevelopment #Azure #SoftwareEngineering #Debugging
Like Comment
To view or add a comment, sign in
Matthew McGovern
1w
Report this post
Postgres turned 30 last year and It seems Postgres is having a moment. And I don't think its because Postgres got better. I believe the biggest tailwind is everything else got more complicated. At some point, teams started counting the real cost of specialized databases: -sync jobs -operational overhead -seven things that can break Then realized Postgres was already doing most of it just fine. Plus with the ecosystem built around it, it's now turning into the obvious answer again. https://lnkd.in/gHbakyPh

It’s 2026, Just Use Postgres | Tiger Data tigerdata.com
Like Comment
To view or add a comment, sign in
Piyush Tyagi
3d
Report this post
A few days back, I ran into an interesting issue in PostgreSQL. The query planner chose a less specific index, even though a more optimal index was clearly available. Why? Because PostgreSQL estimated that fewer rows would match — so it assumed it would be faster. Reality? It turned out to be slower. This is something you rarely notice in local or staging environments. But in production: Data distribution is different. Statistics can be misleading. And the query planner doesn’t always behave the way you expect Key takeaway: Having the right index is not enough. Understanding how the query planner thinks is what actually matters. Production has a way of humbling assumptions. #softwareengineering #database

1 Comment
Like Comment
To view or add a comment, sign in
Arturs Gusevs
1w
Report this post
𝗢𝗻𝗲 𝗶𝗻𝗱𝗲𝘅 𝗰𝗵𝗮𝗻𝗴𝗲 𝗰𝗮𝗻 𝗰𝗼𝘀𝘁 𝟭.𝟱 𝗵𝗼𝘂𝗿𝘀 𝗼𝗳 𝗱𝗼𝘄𝗻𝘁𝗶𝗺𝗲 It wasn't due to a poor-performing database. It wasn't high traffic. 👉 𝗜𝘁 𝘄𝗮𝘀 𝗮𝗻 𝗲𝘅𝗶𝘀𝘁𝗶𝗻𝗴 𝗶𝗻𝗱𝗲𝘅 𝘁𝗵𝗮𝘁 𝘄𝗮𝘀 𝗻𝗼𝘁 𝗯𝗲𝗶𝗻𝗴 𝘂𝘀𝗲𝗱 Details: • Soft deletes were added • A partial index was added • An old index was deleted Everything seemed fine. But: 👉 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗾𝘂𝗲𝗿𝘆 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 𝗱𝗶𝗱 𝗻𝗼𝘁 𝗺𝗮𝘁𝗰𝗵 𝘁𝗵𝗲 𝗶𝗻𝗱𝗲𝘅 As a result: • Query indexes became obsolete • Full table scans began • Connections skyrocketed • The system fell into ruins 👉 𝗦𝗮𝗺𝗲 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲. 𝗦𝗮𝗺𝗲 𝗱𝗮𝘁𝗮. 𝗦𝗮𝗺𝗲 𝗾𝘂𝗲𝗿𝗶𝗲𝘀. But performance suffered immensely. This is just one of the biggest underappreciated dangers in production: 👉 𝗗𝗶𝘀𝗰𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 𝗗𝗕 𝗰𝗵𝗮𝗻𝗴𝗲𝘀 𝗮𝗻𝗱 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗹𝗼𝗴𝗶𝗰 The terrifying thing about this is: 👉 𝗬𝗼𝘂 𝘄𝗼𝗻'𝘁 𝗿𝗲𝗮𝗹𝗶𝘇𝗲 𝗶𝘁 𝘂𝗻𝘁𝗶𝗹 𝗶𝘁'𝘀 𝘁𝗼𝗼 𝗹𝗮𝘁𝗲 At scale: • minutes of delay turn into seconds • turn into downtime This is something we encounter on a regular basis: 👉 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 𝗲𝘅𝗶𝘀𝘁 𝗯𝘂𝘁 𝗱𝗼𝗻'𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺 𝘄𝗲𝗹𝗹 👉 𝗤𝘂𝗲𝗿𝗶𝗲𝘀 𝘄𝗼𝗿𝗸 𝗯𝘂𝘁 𝗳𝗮𝗶𝗹 𝗮𝘁 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 This can't be caught by monitoring. It will not be solved by simply scaling your system. 👉 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗼𝘄𝗻𝗲𝗿𝘀𝗵𝗶𝗽 𝗶𝘀 𝗰𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗵𝗲𝗿𝗲 At AG Data, we ensure that: 👉 database changes, code changes and queries are synchronized 👉 before production rollout #databases #postgresql #mysql #databaseperformance #systemdesign #highload #scalability #devops
Like Comment
To view or add a comment, sign in
Philip McClarence
1w
Report this post
PostgreSQL wait events: the performance metric you're probably ignoring. Most people look at query execution time and call it done. But execution time tells you a query is slow — wait events tell you what it's actually waiting on. Every active session in `pg_stat_activity` has a `wait_event` and `wait_event_type` column. Here's what they mean: 𝗟𝗪𝗟𝗼𝗰𝗸 (𝗟𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁 𝗟𝗼𝗰𝗸) — internal PostgreSQL contention. Buffer mapping, WAL insert, lock manager. If you see a lot of LWLock waits, you're hitting internal bottlenecks — often fixable with configuration changes. 𝗟𝗼𝗰𝗸 — row-level or table-level locks. Transactions blocking each other. This is where you find your contention problems. One long-running UPDATE holding a lock while 50 other sessions pile up behind it. 𝗜𝗢 — disk operations. Buffer reads, WAL writes, data file extends. High IO waits usually mean your working set doesn't fit in shared_buffers, or your storage is too slow for the workload. 𝗖𝗹𝗶𝗲𝗻𝘁 — waiting for the application to send data or consume results. If you see sessions stuck on ClientRead, the bottleneck isn't PostgreSQL — it's your application or network. Here's a quick query to see what your database is waiting on right now: SELECT wait_event_type, wait_event, count(*) FROM pg_stat_activity WHERE state = 'active' AND wait_event IS NOT NULL GROUP BY 1, 2 ORDER BY 3 DESC; I've found production bottlenecks in minutes with this approach that would have taken hours of EXPLAIN analysis. I built wait event analysis into [mydba.dev](https://mydba.dev) — it continuously samples active sessions and breaks down wait events by type, so you can see exactly where your database is spending its time. What's the most surprising wait event you've found in production?
Like Comment
To view or add a comment, sign in
Abdullah Asghar
2w
Report this post
🔥 PostgreSQL Performance Optimization 🚀 Database performance isn’t achieved by throwing more hardware at the problem — it’s about making smarter tuning decisions. In real-world PostgreSQL environments, most performance bottlenecks stem from inefficient queries, poor indexing choices, or suboptimal configuration — not the database engine itself. ⚡ Core Areas to Focus On 1️⃣ Query Optimization * Minimize full table scans whenever possible * Use EXPLAIN ANALYZE to understand execution plans * Retrieve only necessary columns (avoid SELECT *) 2️⃣ Indexing Strategy * Leverage B-tree indexes for general use cases * Use GIN or GiST indexes for JSON and advanced search scenarios * Avoid excessive indexing, as it can negatively impact write performance 3️⃣ Memory & Configuration Tuning * Configure shared_buffers effectively for caching * Adjust work_mem for sorting and complex operations * Fine-tune WAL and checkpoint settings for better throughput 4️⃣ Vacuum & Routine Maintenance * Run VACUUM ANALYZE regularly to prevent table bloat * Ensure autovacuum is properly configured and active 5️⃣ Connection Management * Excessive connections can hurt performance * Use connection pooling solutions like PgBouncer or Pgpool-II 6️⃣ Continuous Monitoring * Identify and track slow-running queries * Monitor locks and blocking sessions * Regularly review execution plans for optimization opportunities 🎯 Final Takeaway Performance tuning isn’t a one-off activity — it’s an ongoing process of monitoring, analyzing, optimizing, and repeating. #postgresql #postgresdba #optimization #dba
Like Comment
To view or add a comment, sign in

2,405 followers

1,890 Posts

View Profile Connect

PostgreSQL Checkpoints Cause Performance Spikes

More Relevant Posts

Explore related topics

Explore content categories