pgwatch: PostgreSQL Monitoring and Troubleshooting Tool

33 followers

pgwatch is one of the best open-source PostgreSQL monitoring tools out there. It is mature, flexible, and genuinely excellent at what it does. Any SQL query becomes a metric. It scales to hundreds of instances. It plugs straight into your existing Grafana stack. So why did we build myDBA.dev? Because we kept seeing the same pattern: teams collect metrics beautifully but struggle with the "now what?" part. A Grafana dashboard shows n_dead_tup at 2.4 million. But is autovacuum disabled? Is a long-running transaction blocking it? What is the exact ALTER TABLE command to fix it? The gap between seeing a number and knowing what to do about it is where hours of debugging live. Here is what we built to close that gap: 1. 75+ health checks that generate copy-pasteable SQL fixes calculated from your actual server configuration -- not generic recommendations 2. An index advisor that analyzes your query workload and produces CREATE INDEX or DROP INDEX CONCURRENTLY statements with estimated impact 3. Automatic EXPLAIN plan collection so you can compare old and new plans when a query regresses 4. Extension monitoring for TimescaleDB, pgvector, and PostGIS -- domains pgwatch does not cover 5. XID wraparound detection that identifies blockers and generates recovery scripts specific to your situation pgwatch answers "what is happening in my database?" We answer "what is happening, what does it mean, and what should I do about it?" Both are valid approaches. The right choice depends on whether your team needs data or direction. Full comparison: https://lnkd.in/eDkhbnVh #PostgreSQL #OpenSource #DatabaseMonitoring #pgwatch #DevOps

To view or add a comment, sign in

More Relevant Posts

MyDBA.dev

33 followers
1mo
Report this post
Choosing between PostgreSQL monitoring tools is one of those decisions that seems straightforward until you're debugging a production issue at 3am and realize your tool shows you the problem but not the fix. I spent time comparing MyDBA.dev and pganalyze -- two PostgreSQL-focused monitoring platforms that take different approaches to what "monitoring" means. **Where pganalyze is strong:** pganalyze has been in the space since 2013 and it shows. Their index advisor uses hypothetical "What If?" analysis to predict index impact before creation. Their VACUUM advisor gives per-table freezing analysis. Their log insights are well-built. It's a mature, polished product. **Where MyDBA.dev goes further:** 1. **Health checks with fix scripts.** 75+ automated checks, each with a ready-to-run SQL fix. When your health score drops at 3am, you get the diagnosis AND the remediation, not just a red chart. 2. **Extension monitoring.** TimescaleDB, pgvector, PostGIS -- dedicated dashboards with extension-specific health checks. pganalyze has zero extension support. If your workload depends on these extensions, that's a significant blind spot. 3. **XID wraparound protection.** A dedicated dashboard with blocker detection and recovery scripts. Not just "your XID age is high" -- but "here's the abandoned replication slot blocking progress, and here's the command to drop it." 4. **Cluster-aware index advisor.** Aggregates index usage across primary + all replicas. An index that looks unused on primary might be critical for read-replica analytics queries. **The pricing gap is significant:** pganalyze starts at $149/mo for one server. No free tier. MyDBA.dev has a free tier (1 server + 1 replica, all features) and Pro starts at $19/mo. I wrote a full, balanced comparison covering both tools' strengths: https://lnkd.in/ee2FSzKV #PostgreSQL #DatabaseMonitoring #pganalyze #DevOps #DatabasePerformance
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
2w
Report this post
By default, PostgreSQL does not log slow queries. The `log_min_duration_statement` parameter ships at `-1` (disabled), which means every performance regression happens in the dark until a user complains or an application timeout fires. The worst part? By the time you notice, the query has been degrading for months. A query that took 50ms six months ago now takes 500ms, and the root cause is buried under months of schema changes and data growth. That is what silent performance drift looks like. Three things I see teams get wrong with slow query monitoring: **1. The threshold is either too high or too low.** Setting `log_min_duration_statement` to 10 seconds catches only catastrophic queries. Meanwhile, a stream of 1-2 second queries collectively dominates your database load. Start at 250ms for transactional workloads -- it captures meaningful slowness without flooding logs. **2. They optimize outliers instead of total load.** A single 800ms query is less important than a 50ms query running 10,000 times per hour (500 seconds of total load). Use `pg_stat_statements` and sort by `total_exec_time`, not `max_exec_time`. The standard deviation column (`stddev_exec_time`) reveals plan instability -- queries that are sometimes fast and sometimes slow. **3. They never enable auto_explain.** `log_min_duration_statement` tells you which queries are slow. `auto_explain` tells you why. Set `auto_explain.log_min_duration = 500ms` to automatically capture execution plans for slow queries. Set `log_analyze = off` in production to avoid doubling execution cost -- the estimated plan is enough for diagnosis. Slow query logging, pg_stat_statements, and auto_explain form a three-layer observability stack that catches regressions before users notice. I wrote a practical guide with the exact configuration, detection queries, and a prevention strategy: https://lnkd.in/exV6FcAq #PostgreSQL #DatabasePerformance #SlowQueries #DBA #DevOps #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Rene' Cannao'
1w
Report this post
🔐 TLS visibility in ProxySQL — finally queryable. One thing that always bothered me operationally: If you wanted to answer simple questions like: - when does this certificate expire? - did the last TLS reload succeed? - which certificate is currently loaded? …the answer was: 👉 go to the filesystem 👉 run openssl 👉 parse manually With ProxySQL 3.0.7, this changes. We introduced two new tables: - stats_proxysql_global - stats_tls_certificates Now you can just run SQL: SELECT cert_type, subject_cn, days_until_expiry FROM stats.stats_tls_certificates; What I like about this approach: 👉 no external tooling 👉 no background collectors 👉 no runtime overhead (computed at query time) 👉 works anywhere you already use ProxySQL This also makes alerting much easier: SELECT ... WHERE days_until_expiry < 30; It’s a small feature, but one that makes day-to-day operations simpler and safer. 📖 https://lnkd.in/g_nwYaiY Curious how others are currently tracking certificate lifecycle in database infrastructure. #ProxySQL #TLS #Security #Observability #DevOps #SRE #Database #MySQL #PostgreSQL

New Observability Tables in ProxySQL 3.0.7: stats_proxysql_global and stats_tls_certificates — ProxySQL Blog proxysql.com
Like Comment
To view or add a comment, sign in
Michael Shumilov
2w
Report this post
🚀 SequelPG v0.11.1 is live If you work with PostgreSQL every day, this will feel familiar: You run a query You tweak it You come back to something from yesterday You try to remember what actually worked Most tools treat query history as just a log. I don’t think that’s enough. In this release, I rebuilt Query History from scratch. Now it’s something you actually use: Quickly find past queries Reuse them without rewriting Debug faster with less context switching I also refactored the Database Tools layer. You won’t “see” most of it — but you’ll feel it: More consistency Better performance Stronger foundation for what’s coming next I’m not trying to add more features. I’m trying to reduce friction when working with data. Full release notes: https://lnkd.in/dFmaV_xH If you use PostgreSQL, I’d really value your feedback. #PostgreSQL #DeveloperTools #IndieHacker #BuildInPublic #SwiftUI #DX

SequelPG v0.1.11: database tools, query history, and a catalog-query security pass sequelpg.com
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
3w
Report this post
Percona PMM is one of the best free database monitoring tools available. It supports MySQL, MongoDB, and PostgreSQL, ships with Grafana dashboards, includes Query Analytics, and the Advisors framework catches security and configuration issues automatically. For polyglot database environments, it is genuinely hard to beat. But if PostgreSQL is your primary database, there is a gap between what a multi-database monitoring tool can offer and what PostgreSQL-specific tooling provides. **Three areas where the difference is most visible:** **1. Health checks with remediation.** PMM's Advisors cover a solid set of configuration and security checks. MyDBA.dev runs 75+ health checks across 10 scored domains and includes the fix SQL for every finding. Not just "this index is missing" -- the exact CREATE INDEX statement. **2. Extension monitoring.** TimescaleDB chunk health, pgvector index selection, PostGIS spatial query anti-patterns -- these are failure modes that generic monitoring has no awareness of. If you run PostgreSQL extensions in production, monitoring that understands them matters. **3. XID wraparound protection.** PMM shows basic XID age metrics. MyDBA.dev provides dedicated wraparound monitoring with blocker detection (long-running transactions, prepared transactions, replication slots), trend analysis, and recovery scripts. This is the kind of PostgreSQL-specific problem that requires PostgreSQL-specific tooling. The infrastructure model is different too. PMM requires a self-hosted server plus agents on every monitored host. MyDBA.dev is SaaS with a lightweight collector -- no monitoring infrastructure to maintain. I wrote a detailed comparison covering query analytics, index recommendations, lock chain visualization, schema diff, pricing, and decision criteria: https://lnkd.in/etP2-D9d #PostgreSQL #Percona #PMM #DatabaseMonitoring #DevOps
Like Comment
To view or add a comment, sign in
Philip McClarence
3w
Report this post
Replication lag isn't one number. It's at least three, and knowing which one you're dealing with changes the fix entirely. PostgreSQL tracks three types of lag for each replica: SELECT client_addr, application_name, state, write_lag, flush_lag, replay_lag FROM pg_stat_replication; 𝗪𝗿𝗶𝘁𝗲 𝗹𝗮𝗴 — WAL received by the replica but not yet written to disk. High write lag = network problem or disk I/O bottleneck on the replica. 𝗙𝗹𝘂𝘀𝗵 𝗹𝗮𝗴 — WAL written but not fsynced. High flush lag = the replica's disk can't keep up with fsync calls. Usually means slow storage. 𝗥𝗲𝗽𝗹𝗮𝘆 𝗹𝗮𝗴 — WAL flushed to disk but not yet applied. This is the most common problem. High replay lag with low write lag means the replica is receiving data fine but can't apply it fast enough. The usual cause of high replay lag: long-running queries on the replica that conflict with WAL replay. PostgreSQL pauses replay to avoid killing active queries (controlled by `max_standby_streaming_delay`). The fix depends on which lag you're seeing: • Write lag → check network, check replica disk throughput • Flush lag → faster storage or fewer replicas • Replay lag → kill long queries on replica, reduce standby delay, or add more replicas Most monitoring tools show you one "replication lag" number. That's like diagnosing a car problem by saying "something is wrong with the engine." Have you hit replication lag issues? What was the root cause? #PostgreSQL #Replication #Database #DevOps #HighAvailability
1 Comment
Like Comment
To view or add a comment, sign in
Philip McClarence
3w
Report this post
If you're running PostgreSQL without EXPLAIN plan tracking, you're flying blind. I know, I know. You've got query duration metrics. You've got pg_stat_statements. You know which queries are slow. But do you know WHY they're slow? Execution time tells you there's a problem. The query plan tells you what the problem actually is. A sequential scan on a 50 million row table. A nested loop where a hash join would be 100x faster. An index that exists but the planner decided not to use it. Here's the thing that catches people: plans change silently. You deploy new code that inserts a million rows. Autovacuum runs and updates statistics. Your table crosses a size threshold where the planner switches strategies. The exact same SQL query -- character for character -- can have a completely different plan on Monday than it did on Friday. And you won't notice until users start complaining. What makes this hard is that plans are invisible by default. PostgreSQL doesn't log them. It doesn't track them over time. It doesn't tell you when a plan changes. You have to ask explicitly, and by the time you think to ask, the damage is usually done. I've watched teams spend hours debugging a "slow database" by staring at pg_stat_statements numbers, tweaking shared_buffers, even adding hardware -- when the root cause was a single query that switched from an index scan to a sequential scan after a statistics update. Five minutes with EXPLAIN would have found it. Most teams only look at EXPLAIN when something is already on fire. By then you're doing forensics, not prevention. #PostgreSQL #EXPLAIN #QueryOptimization #Database #DevOps
1 Comment
Like Comment
To view or add a comment, sign in
Ravi Naveen Gali
3w
Report this post
🐘 Say Goodbye to Migration Headaches with pgloader If you’ve ever had to migrate a database to PostgreSQL, you know the "schema vs. data" struggle. Most tools make you export the schema first, fix it, then import data separately. Enter pgloader—a powerful, open-source tool that automates the entire "Continuous Migration" process in a single command. 🛠️ What can pgloader transform? It doesn't just copy data; it intelligently transforms different source structures into a clean PostgreSQL structure. Supported sources include: ✅ Databases: MySQL, MS SQL Server, SQLite, and Redshift. ✅ Files: CSV, Fixed-format files, dBase (DBF), and IBM IXF. ✅ On-the-fly Transformation: It automatically handles type casting (like converting MySQL's 0000-00-00 dates to NULL) and re-indexes your tables. ⚙️ Flexibilty: Schema vs. Data One of the best features is how it handles different migration needs: SCHEMA ONLY: Use the CREATE NO DATA clause if you only want to replicate the structure. DATA ONLY: Use the CREATE NO SCHEMA clause if you’ve already prepared your target tables (popular for ORM-heavy projects). FULL MIGRATION: By default, it creates the schema, loads the data, and resets sequences—all in one go. 💡 Why I like it It’s built for speed. By using the PostgreSQL COPY protocol and parallel workers, it’s significantly faster than standard INSERT scripts. Plus, it generates a "Summary Report" at the end so you know exactly how many rows were moved and if any errors occurred. Are you a GUI person (like AWS SCT) or a CLI person (like pgloader)? Let’s discuss in the comments! #PostgreSQL #DatabaseMigration #OpenSource #SQL #DataEngineering #pgloader #BackendDevelopment
Like Comment
To view or add a comment, sign in
Philip McClarence
1w
Report this post
Monitor your WAL generation rate before it becomes a problem. Every write in PostgreSQL goes through the Write-Ahead Log. Every byte of WAL must be written to disk, sent to replicas, and archived for backups. High WAL generation means high I/O, replication lag, and backup bloat. On PostgreSQL 14+: SELECT wal_records, wal_fpi, pg_size_pretty(wal_bytes) AS total_wal_generated, stats_reset FROM pg_stat_wal; `wal_fpi` (full page images) is the interesting one. After every checkpoint, the first modification to each page writes the entire 8 KB page to WAL — not just the change. This is a safety mechanism, but it means frequent checkpoints generate much more WAL. Check your checkpoint frequency: SELECT checkpoints_timed, checkpoints_req, pg_size_pretty(buffers_checkpoint * 8192::bigint) AS checkpoint_data FROM pg_stat_bgwriter; If `checkpoints_req` (forced checkpoints) is high relative to `checkpoints_timed` (scheduled checkpoints), your `max_wal_size` is too low. The default of 1 GB is often insufficient for production workloads. A good starting point: max_wal_size = 4GB min_wal_size = 1GB Check this weekly. Sudden spikes in WAL generation usually mean something changed — a bulk operation, a new feature with heavy writes, or a configuration regression. #PostgreSQL #Database #WAL #Performance #DevOps

7 Comments
Like Comment
To view or add a comment, sign in
Rene' Cannao'
6d
Report this post
🐘 PgBouncer is great — but it’s not the whole story. If you run PostgreSQL, chances are you’re using PgBouncer for connection pooling. It’s simple, efficient, and does one thing very well. But at some point, you start hitting limitations: - no query routing - no read/write split - no visibility into traffic - limited control beyond pooling That’s exactly why we wrote this post: 👉 moving from PgBouncer to ProxySQL (for PostgreSQL) ProxySQL is not just a pooler. It’s a SQL-aware proxy that can: - route queries based on rules - split reads/writes - multiplex connections - integrate with HA setups - provide observability So the real question becomes: 👉 when is PgBouncer enough, and when do you need more? This post from Rahim Kanji is the first in a series exploring that transition. 📖 https://lnkd.in/g9H3uVuh Curious to hear from PostgreSQL users: are you hitting limits with PgBouncer? or is it still “good enough” for your use case? #PostgreSQL #PgBouncer #ProxySQL #DevOps #SRE #Database #OpenSource

Part 1 - PgBouncer to ProxySQL: Rethinking the PostgreSQL Middle Tier — ProxySQL Blog proxysql.com

5 Comments
Like Comment
To view or add a comment, sign in

33 followers

View Profile Follow

pgwatch: PostgreSQL Monitoring and Troubleshooting Tool

More Relevant Posts

Explore content categories