Evgenii Klimenko’s Post

1mo

5 PostgreSQL Features Every Backend Developer Should Know PostgreSQL is more than just a database. Many developers start using Postgres like a simple storage engine. Tables. Queries. CRUD. But Postgres can do much more. Here are 5 features that make PostgreSQL incredibly powerful: • JSONB – store and query semi-structured data • Indexes – B-tree, GIN, GiST for high performance queries • CTEs – write complex queries in a clean way • Extensions – PostGIS, pgvector, TimescaleDB • Full-text search – built directly into the database In many cases Postgres can replace entire parts of your stack. Search engine. Vector database. Analytics engine. The real power of PostgreSQL isn’t just storing data. It’s processing data efficiently inside the database. What PostgreSQL feature do you use the most? #PostgreSQL #Backend #Databases #SoftwareEngineering

To view or add a comment, sign in

More Relevant Posts

Jatin Rathod
1mo
Report this post
🚀 PostgreSQL Extensions Every DBA Should Know (But Many Don’t Use) PostgreSQL is powerful out of the box. But the real magic? ✅ Extensions They can turn PostgreSQL into a : • Monitoring system • Time-series database • Query analysis tool • Scalable data engine And most teams barely use them properly. 🔧 Here are PostgreSQL extensions every DBA should know: 1️⃣ pg_stat_statements Tracks real query performance across your database. 2️⃣ PostGIS Adds geospatial capabilities for location-based queries. 3️⃣ TimescaleDB Optimized for time-series workloads like logs and metrics. 4️⃣ pg_partman Automates table partitioning for large datasets. 5️⃣ pg_trgm Improves fuzzy search and text similarity queries. 🧠 Real-World Scenario (What Most DBAs Actually Face) A common situation in production: An application table grows steadily over time (10M → 20M → 50M rows) Read queries start slowing down. Autovacuum struggles to keep up. Index bloat increases. CPU and I/O usage spike during peak hours. No sudden failure. Just gradual performance degradation. What typically works in this situation: ✔️ Use pg_stat_statements to identify high-frequency and slow queries ✔️ Introduce partitioning using pg_partman to split large tables into manageable chunks ✔️ Rebuild or optimize indexes based on actual query patterns ✔️ (Optional) Move time-based data to TimescaleDB if workload is heavily time-series 💡 Outcome (What DBAs usually observe) More predictable query performance. Reduced index and table scan overhead. Better autovacuum efficiency. Improved stability under load. Not magic Just good engineering. ⚡ Takeaway Most PostgreSQL performance issues aren’t because PostgreSQL is weak. They happen because: ⚠️ We don’t use the tools PostgreSQL already gives us. Extensions are part of the system not an add-on. If you're working with PostgreSQL Which extension has saved you the most in production? Comment below 👇🏻 #PostgreSQL #DBA #DatabasePerformance #DataEngineering #Backend #OpenSource #Tech
2 Comments
Like Comment
To view or add a comment, sign in
Jatin Rathod
1mo
Report this post
🚨 What Actually Happens When You Run VACUUM in PostgreSQL? Most developers think VACUUM is just “cleanup”. That’s not even half the story. PostgreSQL doesn’t overwrite data when you UPDATE or DELETE. Because of MVCC (Multi-Version Concurrency Control), it creates new row versions and leaves the old ones behind. These old rows are called dead tuples. 🧠 So what does VACUUM really do? When you run VACUUM, PostgreSQL: ✔️ Scans the table page by page ✔️ Identifies dead tuples (no longer visible to any transaction) ✔️ Marks that space as reusable (not returned to OS) ✔️ Updates visibility map for faster future scans Important: 👉🏻 VACUUM does NOT shrink your table size ⚠️ Why this matters in production If you don’t vacuum regularly: • Dead tuples keep accumulating • Table bloat increases • Indexes become inefficient • Queries slow down • Storage usage grows Eventually… performance degrades badly. 🔥 VACUUM vs VACUUM FULL 👉🏻 VACUUM • Non-blocking • Reuses space internally • Safe for production 👉🏻VACUUM FULL • Rewrites the entire table • Returns space to OS • Requires exclusive lock (blocks reads/writes) • Use VACUUM FULL only when absolutely necessary. ⚙️ What about AUTOVACUUM? PostgreSQL already runs an autovacuum in the background. But here’s the catch: 👉🏻 Default settings are often NOT enough for high-write workloads As a DBA, you should monitor: pg_stat_user_tables Dead tuple count Autovacuum frequency 💡 Real takeaway. VACUUM is not optional maintenance. It’s part of how PostgreSQL stays alive under load. Ignore it, and your database won’t crash… It will just get slower and slower until users start complaining. #PostgreSQL #Database #DBA #PerformanceTuning #Backend #DataEngineering #SQL #DevOps
Like Comment
To view or add a comment, sign in
Alexey Orlov
1mo
Report this post
Why Did PostgreSQL Consume 16 GB If work_mem Was Set to 64 MB? The answer from a PostgreSQL core investigation by Thomas Vondra (pgsql-hackers): What actually happened: The query plan contained a Hash Join node with a huge expected row count and large width. PostgreSQL tried to stay within work_mem, but Hash Join realized the data wouldn't fit in the allocated 64 MB. To solve this, it started splitting data into smaller chunks (batches) and spilling them to temp files on disk. Due to the enormous data volume, PostgreSQL created 1 million batches. Here's the math: 1 million batches = 1 million temp files for the hash table + 1 million files for the outer relation. Total: 2 million open BufFile objects. Each file gets an 8 KB buffer allocated in RAM. 2,000,000 × 8 KB = ~16 GB of memory! This behavior can become a cause of OOM killer on any PostgreSQL below version 16. The fix (committed to PostgreSQL 16+): https://lnkd.in/dGGq9WC3

Allocate hash join files in a separate memory context · postgres/postgres@8c4040e github.com

5 Comments
Like Comment
To view or add a comment, sign in
Thanga samy S
4w
Report this post
🚀 What *actually* happens inside PostgreSQL when you run a query? Most developers use PostgreSQL daily… but very few understand what’s happening behind the scenes. Here’s a simple breakdown 👇 🟢 1. You run a query Example: SELECT * FROM users WHERE id = 1; 🧠 2. Query Parser kicks in PostgreSQL checks: * Is the SQL valid? * Do tables/columns exist? ⚙️ 3. Planner / Optimizer This is where the magic happens ✨ Postgres decides: * Use index? 🔍 * Or scan full table? 📦 🚀 4. Executor runs the query Now PostgreSQL actually fetches data. ⚡ 5. Memory check (super important) * If data is in cache (shared buffers) → FAST ⚡ * If not → load from disk → slower 💾 6. Disk access Data is stored in pages (8KB blocks) Postgres reads only required pages — not entire table 👌 🔁 7. For WRITE queries (INSERT / UPDATE / DELETE) Example: UPDATE users SET age = 30 WHERE id = 1; Here’s what REALLY happens: 1. Data updated in memory 2. Change written to WAL (Write-Ahead Log) 🧾 3. Success response sent ✅ 4. Actual disk write happens later 👉 This is called: “Log first, write later” 🛡️ 8. Crash Safety (WAL) If system crashes 💥 Postgres: * Replays WAL logs * Restores data No data loss 🔥 🔄 9. MVCC (Concurrency magic) Instead of overwriting: Old row → stays New row → created So: * Readers don’t block writers * Writers don’t block readers 🧹 10. VACUUM (cleanup crew) All old rows = “dead tuples” Postgres cleans them using: VACUUM / Autovacuum 🤖 🎯 Final Flow (Simple) Query → Parser → Planner → Executor → Memory → Disk → WAL → Result 💡 Takeaway: PostgreSQL is not just a database — it’s a highly optimized system with: ✔ Smart query planning ✔ Efficient caching ✔ Crash recovery (WAL) ✔ High concurrency (MVCC) If you’re preparing for backend/system design interviews… understanding this gives you a HUGE edge. #PostgreSQL #BackendEngineering #SystemDesign #Databases #SoftwareEngineering #plpgsql

1 Comment
Like Comment
To view or add a comment, sign in
Philip McClarence
1mo
Report this post
Find idle connections hogging your PostgreSQL database. SELECT state, count(*) AS connection_count, round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct FROM pg_stat_activity WHERE backend_type = 'client backend' GROUP BY state ORDER BY connection_count DESC; If idle connections outnumber active connections by 10:1, you don't need a higher max_connections. You need a connection pooler. Each PostgreSQL connection uses 5-10 MB of RAM. But the real cost isn't memory — it's lock contention and process scheduling overhead. PostgreSQL was designed for tens to low hundreds of connections, not thousands. Worse: `idle in transaction` connections hold locks and prevent autovacuum from cleaning up dead tuples. They're silently degrading your database while doing absolutely nothing. Find the worst offenders: SELECT pid, usename, application_name, state, now() - state_change AS idle_duration, left(query, 60) AS last_query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY state_change; Any connection idle in transaction for more than a few minutes is a bug in the application code. Fix the application, don't raise the timeout. Run this during peak hours. The ratio tells you everything. #PostgreSQL #Database #ConnectionPooling #DevOps #Performance

4 Comments
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
1mo
Report this post
Find idle connections hogging your PostgreSQL database. SELECT state, count(*) AS connection_count, round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct FROM pg_stat_activity WHERE backend_type = 'client backend' GROUP BY state ORDER BY connection_count DESC; If idle connections outnumber active connections by 10:1, you don't need a higher max_connections. You need a connection pooler. Each PostgreSQL connection uses 5-10 MB of RAM. But the real cost isn't memory — it's lock contention and process scheduling overhead. PostgreSQL was designed for tens to low hundreds of connections, not thousands. Worse: `idle in transaction` connections hold locks and prevent autovacuum from cleaning up dead tuples. They're silently degrading your database while doing absolutely nothing. Find the worst offenders: SELECT pid, usename, application_name, state, now() - state_change AS idle_duration, left(query, 60) AS last_query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY state_change; Any connection idle in transaction for more than a few minutes is a bug in the application code. Fix the application, don't raise the timeout. Run this during peak hours. The ratio tells you everything. #PostgreSQL #Database #ConnectionPooling #DevOps #Performance
Like Comment
To view or add a comment, sign in
Shyju M
1mo
Report this post
Our PostgreSQL query executed 1,000 subqueries. It only needed 75. Imagine this: • 1,000 orders • Each referencing one of 75 products The original query fetched the product name per order using a correlated subquery. SELECT o.id, (SELECT name FROM product WHERE id = o.product_id) FROM orders o; PostgreSQL executed the product lookup 1,000 times. The fix? Rewrite it using a LATERAL join. SELECT o.id, p.name FROM orders o LEFT JOIN LATERAL ( SELECT name FROM product WHERE id = o.product_id ) p ON true; Now PostgreSQL introduces a Memoize node. It caches results by product_id. From the query plan • 1000 rows processed • 75 cache misses (real product lookups) • 925 cache hits (reused results) That means PostgreSQL only queried the product table 75 times instead of 1000. Small query rewrite. Big difference in execution strategy. This works best when many rows reference a small set of keys. If every row has a unique key, Memoize has nothing to reuse. Sometimes performance improvements aren't about adding indexes — they're about changing the shape of the query plan. Note: Hash and Merge joins were disabled for demonstration. Normally PostgreSQL chooses the join strategy based on cost. #PostgreSQL #SQLPerformance #BackendEngineering
Like Comment
To view or add a comment, sign in
Nitin Rawat
1mo
Report this post
Day 04 of learning PostgreSQL in public. PostgreSQL Architecture. Most developers never look under the hood. Here is the simple breakdown: → Client-Server Model — every time your app connects to PostgreSQL, it acts as a client sending requests to the PostgreSQL server → Forking Process — PostgreSQL spawns a brand new backend process for every single client connection, keeping each one fully isolated → Failure Handling — because each connection runs in its own process, a crash in one never affects the others. Your database stays stable no matter what → Postmaster — the main process that listens for connections and forks a new backend each time a client connects → Shared Memory — all backend processes share a common memory area for caching and coordination This architecture is exactly why PostgreSQL is trusted in production at scale. I documented everything with clean working code examples on my GitHub Link: https://lnkd.in/d2UHfM67 I am also building a free 8-week SQL Zero-to-Hero roadmap for anyone starting from scratch. No guessing. No random rabbit holes. Just a clear path from zero to confident. Follow Nitin Rawat me so you never miss a day 🔔 #PostgreSQL #DatabaseArchitecture #ClientServer #Forking #BackendDevelopment #LearnSQL #GitHub #100DaysOfCode #SoftwareEngineering

001-SQL-Structured-Query-Language-/004 SQL ZERO TO HERO/005 POSTGRESQL ARCHITECTURE.md at main · code4coin/001-SQL-Structured-Query-Language- github.com
Like Comment
To view or add a comment, sign in
Shambhu Das Bairagi
1mo
Report this post
Most people think PostgreSQL is only a relational database. But in reality, PostgreSQL can also work as a powerful Document Database. With JSONB support, PostgreSQL allows you to store flexible document structures similar to NoSQL databases like MongoDB while still keeping the power of SQL. Key advantages: • Store semi-structured data using JSONB • Query JSON fields with SQL • High performance with GIN indexing • Full ACID transactions • Combine relational and document models in the same database This hybrid capability makes PostgreSQL a great choice for modern applications where both structured and flexible data models are required. PostgreSQL continues to evolve beyond a traditional database - becoming a multi-model data platform. #PostgreSQL #Database #NoSQL #JSONB #DataEngineering #DatabaseArchitecture

2 Comments
Like Comment
To view or add a comment, sign in
Burçin Yazıcı
1mo
Report this post
Your PostgreSQL instance doesn't need upgrading. It needs a cleanup. I've had this conversation more times than I can count. Database slowing down. Storage bill creeping up. Engineers start talking about up-sizing the RDS instance. Nobody thinks to check the dead tuple count. Here's the thing about PostgreSQL's MVCC model: Every UPDATE you run leaves a dead row behind. Every DELETE leaves a dead row behind. Autovacuum is supposed to clean them up. But on high-write tables with default settings, it quietly falls behind — and stays behind. I analyzed a team's "events" table last year: → Total size: 89 GB → Live data: ~40 GB → Bloat (dead tuples + index + TOAST): ~49 GB They'd up-sized their instance 6 months earlier because queries were "slow". The instance wasn't too small. The table was full of garbage. Fix: per-table autovacuum tuning + pg_repack on the worst offenders. No downtime. No migration. Bloat gone in a weekend. Result: ~$9,500/year back in infra costs. And they were able to downsize the instance they'd just paid to upgrade. The worst part? The monitoring looked fine the entire time. Disk alerts track total usage — not what that space is made of. I wrote up the full breakdown: how MVCC creates bloat, why autovacuum falls behind, what TOAST table bloat is (most people don't know this one exists), and the 4 SQL queries that tell you exactly where you stand. Link in comments 👇 #postgresql #database #sre #devops #postgres

1 Comment
Like Comment
To view or add a comment, sign in

229 followers

5 Posts

View Profile Connect

Evgenii Klimenko’s Post

More Relevant Posts

Explore content categories