Ever wondered what happens when a SQL query runs? Let me tell you . That “simple” SELECT isn’t simple at all. SELECT * FROM users WHERE id = 17; Looks straightforward. Under the hood in PostgreSQL, it’s a four-stage decision engine. *1. Parse* PostgreSQL breaks your SQL into an internal structure, validates syntax, and resolves table and column names. Get this wrong and the query never runs. *2. Rewrite* Before execution, PostgreSQL applies rules and view transformations. Your query might already be reshaped before the planner even sees it. *3. Plan* This is where PostgreSQL earns its reputation. It weighs multiple execution paths: Index Scan, Sequential Scan, Bitmap Heap Scan. It picks the “cheapest” option based on table stats, row estimates, and data distribution. Cheapest ≠ fastest. It’s just what the model predicts will cost least in I/O and CPU. *4. Execute with MVCC* Now it finally touches data. But it also checks visibility. A row can exist on disk and still be invisible to you if your transaction shouldn’t see it yet. That’s MVCC in action. *The part most engineers overlook:* This entire flow happens for every SELECT. Every time. So when performance tanks, it’s usually not the query. It’s the context: - Outdated statistics → wrong plan - Poor indexing → wrong scan - Skewed data → bad estimates PostgreSQL isn’t just executing your query. It’s deciding _how_ to execute it. And that decision determines whether your query runs in milliseconds or minutes. What’s the most surprising query plan you’ve seen in production? #PostgreSQL #Databases #Performance #QueryTuning
What happens when a SQL query runs in PostgreSQL
More Relevant Posts
-
I was debugging a slow query. Turns out… the database was just taking the scenic route 🌳 Everything looked fine. Indexes? Present. Query? Clean. Still… latency said “not today.” Went one layer deeper. Not the query. Not the data. 👉 The tree underneath it. Databases like MySQL and MongoDB don’t magically fetch data. They walk a path. And if that path is messy… your query is basically sightseeing before doing real work. 💡 What actually fixed it: Not rewriting the query 10 times. Instead: Rebuilding / optimizing indexes (hello B-Trees 👀) Fixing index choice based on access patterns Dropping useless indexes that were bloating traversal Checking if data distribution was skewing the tree ⚙️ The shift: I stopped asking: ❌ “Is my query efficient?” and started asking: ✅ “Is my data structure forcing extra work?” Because in production: Bad query → obvious problem Bad tree → silent performance killer And yeah… Sometimes the fix isn’t clever code. It’s just admitting: 👉 your tree grew wild and needs pruning. Next time your query is slow, don’t just optimize SQL. Trim the tree—you might save milliseconds (and your weekend). #BackendEngineering #SystemDesign #PerformanceEngineering #Databases #DSA #SoftwareEngineering
To view or add a comment, sign in
-
Before you add a Postgres index (a shortcut to find data faster), answer these 4 questions. I see this mistake in code reviews every week. A slow query shows up → someone adds an index → assumes it’s fixed. But it makes things worse half the time. Before adding an index, check: 𝟭/ 𝗜𝘀 𝘁𝗵𝗲 𝗰𝗼𝗹𝘂𝗺𝗻 𝘂𝘀𝗲𝗱 𝗶𝗻 𝗪𝗛𝗘𝗥𝗘, 𝗝𝗢𝗜𝗡 , 𝗼𝗿 𝗢𝗥𝗗𝗘𝗥 𝗕𝗬 If it only appears in SELECT, the index is unlikely to help. 𝟮/ 𝗛𝗼𝘄 𝗯𝗶𝗴 𝗶𝘀 𝘁𝗵𝗲 𝘁𝗮𝗯𝗹𝗲 Postgres chooses between scanning and indexing based on cost. If a large portion of rows is returned, it may ignore the index. 𝟯/ 𝗪𝗵𝗮𝘁’𝘀 𝘁𝗵𝗲 𝗿𝗲𝗮𝗱-𝘁𝗼-𝘄𝗿𝗶𝘁𝗲 𝗿𝗮𝘁𝗶𝗼 Indexes speed up reads, but every insert and update has to maintain them. On write-heavy tables, each index adds overhead. 𝟰/ 𝗜𝘀 𝘁𝗵𝗲 𝗰𝗼𝗹𝘂𝗺𝗻 𝗵𝗶𝗴𝗵 𝗰𝗮𝗿𝗱𝗶𝗻𝗮𝗹𝗶𝘁𝘆 Indexes work best when they narrow down to a small set of rows. Columns with very few distinct values ( low cardinality like enums or booleans ) don’t filter much on their own, but can still help in combination. Run EXPLAIN ANALYZE before. Run it after. If the cost doesn’t drop, the index isn’t helping. Drop it. Indexes are not free. They’re a trade-off. Most people add indexes to fix queries Better engineers fix queries so they don’t need indexes.
To view or add a comment, sign in
-
-
Are your queries getting slower as your data grows? You might not have an indexing problem — you might be using the wrong index. When working with databases like PostgreSQL,performance is not just about writing correct queries, it's about writing efficient ones. An index is a data structure that allows the database to locate rows faster instead of scanning the entire table.Without indexes, most queries turn into full table scans which becomes expensive as your data grows. Index Types in PostgreSQL: • B-Tree (Default) The most commonly used index. Works with equality and range queries (=, <, >, BETWEEN) and is the default choice for most use cases. • Hash Index Optimized for equality comparisons (=). Fast lookups, but no support for ranges or sorting. • GIN (Generalized Inverted Index) Designed for JSONB, arrays, and full-text search. Instead of indexing rows, it indexes individual elements — making it powerful for complex queries. • GiST (Generalized Search Tree) Supports advanced data types like geometric data, ranges, and nearest-neighbor searches. • BRIN (Block Range Index) Efficient for very large tables. Stores summaries of data blocks instead of row-level indexes. Ideal for sequential data like logs or timestamps. Real-World Use Case: Why GIN Index Matters If you're building a marketplace and storing dynamic attributes in JSONB, filtering can become very slow. Without a GIN index, these queries would require scanning the entire table. With a GIN index, PostgreSQL can directly target matching entries — significantly improving performance. Trade-offs Indexes improve read performance, but they come at a cost: • Additional storage • Slower write operations (INSERT, UPDATE, DELETE) The goal is not to add more indexes — its to choose the right one based on your query patterns. #PostgreSQL #BackendDevelopment #Database #SoftwareEngineering #WebDevelopment #SQL
To view or add a comment, sign in
-
-
Most engineers think table partitioning is an "enterprise" feature. In reality, it's often the simplest fix for your slow time-series queries. If you’re logging events or anything time-based in PostgreSQL and everything lives in one giant table, you’re forcing every query to fight through data it doesn’t care about. Even with indexes. In the example from the video, the base table is `events` with: - `id` (generated) - `created_at` (timestamp) - `user_id` - `event_id` The only difference from a normal table definition is this line: ```sql PARTITION BY RANGE (created_at) ``` From there, you create monthly partitions: ```sql CREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); ``` When you insert 20,000+ rows per month and query with a `WHERE created_at BETWEEN ...`, PostgreSQL can do partition pruning: it only touches the relevant child tables instead of scanning the entire dataset. The key takeaway: if your queries are almost always time-bounded, a properly partitioned table can give you a bigger win than another round of index tweaking. I walk through the exact SQL and show how partition pruning changes what PostgreSQL actually scans here: https://lnkd.in/ekhNRkss
To view or add a comment, sign in
-
PostgreSQL Tip: Don’t Use GIN Index for “Normal” Data I’ve seen this mistake quite often in performance tuning discussions — using GIN indexes on regular scalar columns like TEXT, INT, or VARCHAR. Let’s clear this up. GIN (Generalized Inverted Index) is designed for: JSONB Arrays Full-text search (TSVECTOR) It indexes elements inside values, not the value itself. What happens if you use GIN on normal data? Slower INSERT/UPDATE operations Larger index size No performance gain for equality or range queries Query planner may ignore the index altogether Use the right index for the right job: B-Tree → equality, joins, sorting GIN → JSONB, arrays, full-text search GIN + pg_trgm → LIKE / ILIKE '%search%' BRIN → very large, sequential datasets Example (Correct Use Case): CREATE EXTENSION pg_trgm; CREATE INDEX idx_name_trgm ON users USING GIN (name gin_trgm_ops); Perfect for: WHERE name ILIKE '%raj%' Bottom line: Using GIN on normal columns doesn’t just not help — it can actually hurt your database performance. Choose indexes intentionally. PostgreSQL gives you power — but only if you use it wisely. #PostgreSQL #DatabaseOptimization #PerformanceTuning #BackendEngineering #DataEngineering #SQL #SoftwareArchitecture
To view or add a comment, sign in
-
-
Why you can't "just" concurrently index partitioned tables in Postgres We’ve been working on partitioning some of our fastest-growing tables recently. Since these are high-traffic, write-heavy tables, we tried to add indexes concurrently to avoid downtime. That's when we hit a classic wall: ERROR: cannot create index on partitioned table "..." concurrently. The "Why": CREATE INDEX CONCURRENTLY is designed to avoid locking your table. But since a partitioned table is a virtual parent, Postgres can’t coordinate a "no-lock" build across multiple child tables at once from a single command. It defaults to an Access Exclusive Lock, which, in a production environment, is a dealbreaker for performance. The "Create-and-Link" Pattern we used instead: • The Shell: Run CREATE INDEX ... ON ONLY parent_table. This adds the metadata to the parent but keeps it Invalid at first. • The Background Build: Run CREATE INDEX CONCURRENTLY on each individual partition. This builds the index without blocking our production traffic. • The Link: Attach each partition index to the parent: ALTER INDEX parent_idx ATTACH PARTITION child_idx. The Result: Once the final partition is linked, the parent index automatically becomes Valid. We got zero downtime, and any future partitions will now inherit this index automatically. It's a bit more manual work, but it's a lifesaver for keeping a high-growth system responsive. #PostgreSQL #DatabaseEngineering #SystemDesign #Backend #Scalability #SDE
To view or add a comment, sign in
-
Did adding just ONE line of code make your database query 100x faster? 🤔⚡ ------------------------------- We’ve all seen it happen. An application is crawling, a specific query is taking 3 seconds, and the user experience suffers. You add an index, and suddenly it takes 0.03 seconds. It feels like magic. But it’s actually fundamental data structure engineering. Here is what really happens when you index a database like PostgreSQL or MySQL: ❌ The Problem: The Full Table Scan Imagine I hand you a 1,000-page biology textbook and ask you to find every mention of the word "mitochondria." Without a glossary, you have to read every single page, start to finish. This is a Full Table Scan. It is mathematically predictable, but slow. If the table (the book) grows from 1,000 pages to 10 million pages, your query becomes unusable. ✅ The Solution: Database Indexing An index is a sorted glossary of specific data points (like user IDs or emails). Behind the scenes, the database builds a specialized data structure, usually a B-Tree or a Hash Map. Instead of reading 10 million rows, the database uses the B-Tree's sorted architecture to find your data packet in milliseconds. It doesn’t work harder; it just knows exactly where to look. ⚖️ The Trade-off (Crucial Point!) Indexes are powerful, but they aren't free: Storage Costs: Indexes take up extra disk space. A heavy index on a massive table can significantly increase storage needs. Slower Write Operations: Every time you INSERT a new row, the database also has to spend time updating the index (glossary). Writing too many indexes can slow down your data writes. ------------------------------- Conclusion: Database speed isn't about hope. It's about knowing your access patterns and building the right B-Trees. 🚀 #Database #SoftwareEngineering #PostgreSQL #MySQL #BackendDevelopment #TechTips #PerformanceEngineering
To view or add a comment, sign in
-
-
Hot take: Most partitioned Postgres tables shouldn't be partitioned. We know. Controversial. Let us explain. Partitioning has become the default recommendation for any table over a certain size. "It's 100GB? Partition it." "Queries are slow? Partition it." "Scaling up? Partition it." But partitioning isn't free. It adds planning overhead. It complicates migrations. It makes some queries faster and others slower. And if your partition key doesn't match your query patterns, you've just turned one table into dozens of tables that Postgres has to scan one by one. That's not optimization. That's self-inflicted complexity. So here's the Data Drop #9 framework by Bhupathi Shameer — partition when: ✅ Your queries consistently filter by a predictable key (time range, tenant ID) ✅ You need to archive or drop old data without expensive DELETE operations ✅ Maintenance on the full table (VACUUM, REINDEX) is no longer manageable ✅ You can clearly articulate which partitions most queries will hit Don't partition when: ❌ You're hoping it'll magically speed up queries you haven't profiled yet ❌ Your queries don't filter by the partition key ❌ Your table is large but your actual problem is missing indexes or bad query plans ❌ You're adding it because a blog post said "partition everything over 10GB" The line between "this will save us" vs "this will haunt us" is thinner than most teams think. #AprilDataDrops #PostgreSQL #DataDrop9 #Partitioning #Database #Performance #DataModeling #OpenSourceDB
To view or add a comment, sign in
-
DuckDB. Discovery of the week. I lived with the conviction that PostgreSQL is the answer to everything for small to mid-size projects. Any data question — use Postgres. Period. Man, was I wrong. Set up DuckDB on top of Parquet files and my jaw dropped: 1. Parquet as storage is weird at first. But the more you think about it, the better it looks. Columnar compression out of the box, folder-based partitioning, read only the columns you need. 2. Analytics on a columnar engine just flies. Aggregations that Postgres had to think about for seconds — instant here. Different league entirely. 3. Zero infrastructure. No server, no pg_hba.conf, no daemons. Just one file. Yes. Just. One. File. You heard me. 4. Eats everything. CSV, JSON, Parquet — all via one SELECT. No loading, no schemas, no prep. 5. Squeezes the machine dry. Vectorized execution, core-level parallelism. Gets every last drop out of a little server for 50 EUR. I was amazed that your ETL would load no matter what. No locks, no contention — it just works. Schema changes? Does not matter, it will still work. Now, the downsides. Only one session with write permissions allowed. Like, what is that? Connecting PowerBI? Shaman dances and two databases. Letting users connect from Excel? Via API, friends! But you know what. It is all details. It is so FAST that everything else becomes secondary. If you're running analytical workloads and still on Postgres — give it a shot. There are better things to do than to suffer in Postgres. #duckdb #discovery #data
To view or add a comment, sign in
-
🚀 “The query was taking 45 minutes… we reduced it to 12 seconds.” Last week, I worked on a performance issue in both Microsoft SQL Server and PostgreSQL — same business problem, different engines. Here’s what I learned 👇 🔍 Problem A reporting query: Multiple joins (5+ tables) Huge data (~50M rows) Running during peak hours Blocking other critical transactions ⚠️ What was going wrong? In SQL Server: Parallelism overhead (CXPACKET / CXCONSUMER waits) Missing indexes Bad execution plan due to outdated statistics In PostgreSQL: Sequential scan instead of index scan Poor query plan due to wrong cost estimation No proper vacuum/analyze 🛠️ What we did (Real Fix) ✅ 1. Index Optimization Added covering indexes (SQL Server) Created composite indexes (PostgreSQL) 👉 Result: Massive reduction in I/O ✅ 2. Updated Statistics SQL Server → UPDATE STATISTICS PostgreSQL → ANALYZE 👉 Better execution plans instantly ✅ 3. Query Rewrite Removed unnecessary columns Reduced joins Used EXISTS instead of IN 👉 Cleaner + faster execution ✅ 4. Parallelism Control (SQL Server) Tuned MAXDOP Adjusted cost threshold 👉 Reduced CPU pressure ✅ 5. Vacuum & Maintenance (PostgreSQL) Ran VACUUM ANALYZE Checked bloat 👉 Improved planner accuracy 📈 Final Result 🔥 Execution time: 45 minutes → 12 seconds 🔥 CPU usage dropped by 70% 🔥 Blocking issues eliminated 💡 Key Takeaway Performance tuning is NOT about one fix. It’s about: Understanding execution plans Knowing how the engine behaves Fixing root causes, not symptoms. #SQLServer #PostgreSQL #PerformanceTuning #Database #DBA #QueryOptimization #TechLeadership #DataEngineering
To view or add a comment, sign in
More from this author
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development