Scaling PostgreSQL

Migo Lee

Published Mar 22, 2026

Most performance problems in PostgreSQL don’t come from a single bottleneck - they emerge as your system grows: more data, more traffic, more concurrency. What worked perfectly at 10K rows and a handful of connections begins to break down at millions of rows and thousands of requests per second.

Scaling PostgreSQL isn’t about a single silver bullet. It’s a layered approach, starting from simple vertical scaling and moving toward more sophisticated techniques like replication, connection pooling, partitioning, and eventually sharding. Each step addresses a different kind of limitation - memory pressure, I/O throughput, connection overhead, or dataset size.

This guide walks through those layers, not just from a “what to do” perspective, but from the inside out - how PostgreSQL actually behaves under the hood, and why each technique works (or sometimes doesn’t).

Vertical Scaling

Probably the easiest yet most expensive options. For example, adding more RAM allows you to increase the shared_buffers setting.

shared_buffers is a "global" pool of memory shared by all Postgres processes. It caches data blocks from tables and indexes and when a user runs a query, Postgres first looks here; "a cache hit" returns data instantly.

Similarly, you have wal_buffers which caches transaction logs before they are flushed to disk. It’s important to remember that Postgres has two destinations for persistence: the WAL (the fast journal) and the Data Files (the actual table).

With default setting(-1), Postgres allocates about 1/32nd of your shared_buffers, usually capping it at 16MB. This 16MB cap aligns with the default WAL segment size. While auto-tuning works for most, if you are inserting millions of rows in a single transactions, WAL data generated might exceed 16MB. If the buffer fills up before you commit, Postgres is force to flus to disk mid-transaction, slowing things down.

Therefore, properly sizing shared_buffers and wal_buffers can provide a smoother "pipeline" for write operations.

Read Replication

Most applications are read-heavy. So set up a primary node for all writes (INSERT/UPDATE/DELETE) and one or more replica nodes for reads.

Internals: How does replication go?

Postgres uses Streaming Replication, which is sending a log at granular level.

The Sender (walsender): On the Primary node, a dedicated process called the walsender watches the WAL as it’s being written.
The Receiver (walreceiver): On the Replica, a process called the walreceiver connects to the Primary and says, "Give me everything you’ve got since the last byte I received."
The Replay: The Replica take those WAL bytes and "replays" them into its own shared_buffers and data files, exactly like it's recovering from a crash.

You might be wondering if this replication is guaranteed to be "consistent" in a sense that all replicas and master have the exact same copy. Well, it depends - by default, it's not.

If you configure Replication Mode and by default, it's set to be asynchronous replication, meaning:

Primary writes to its local WAL, returning "Success" to the app
Later, Primary sends the data to the replica
By design, there is a tiny gap(replication lag).

If you set synchronous replication, the flow goes:

Primary writes local WAL
Sends WAL to Replica
Waits for the Replica to say "I got it!"
Tells the App "Success!"

So this ensures the replica has durably received the WAL. But still, not necessariliy that it is fully replayed and query-visible. To top it off, if the network is slow or the replica crashes, your writes on the primary will hang or fail.

Recommended by LinkedIn

When Celery Overwhelms PostgreSQL: A Guide to…

Dipen Shah 6 months ago

Understanding PostgreSQL Table Bloat and Dead Tuples

Philip McClarence 1 month ago

PostgreSQL 18: A Paradigm Shift for Modern Data…

Jatin Jain Saraf 2 months ago

Connection Pooling

PostgreSQL uses a process-per-connection model, meaning that 5,000 active app threads trying to talk to the DB simultaneously means the creation of 5,000 processes which can overwhelm the system due to memory overhead, context switching, and contention.

A pooler(like PgBouncer, PgCat, or RDS Proxy) acts as a controller. It maintains a small set of "warm" database connections and multiplexes them across thousands of incoming application requests.

Internally, there are three strategies:

Session Pooling: when a client connect, they are assigned a server connection for "as long as" they stay connected. If the app stays "idly", however, that DB connection is locked and cannot be used by anything else.
Transaction Pooling: The pooler give a client a DB connenection only when they start a transaction(BEGIN). As soon as the client sends COMMIT or ROLLBACK, the pooler immediately pull that connection and give it to someone else. So, 100 apps can "share" 5 DB connections if they are not all hitting database at the exact same milli second.
Statement Pooling: The connection is returned to the pool after every single query. But the usage of this is very restrictive since we cannot even use multi-statement transactions (Therefore no begin/commit)

Apparently, Transaction pooling takes some sweet spot with one down side: You cannot use certain features like SET variables or session state(such as temp tables, advisory locks, cursors) because the next time you run a query, you might be on a totally different physical connection. So, RDS proxy and PgBouncer use a fallback called Session Pinning.

If the pooler detects a state-changing command (like a SET variable or a temporary table), it effectively "demotes" that connection from Transaction mode to Session mode. It pins that specific database connection to that specific client until they disconnect.

Of course, if your application code "pins" every connection, your pooler becomes useless. You'll end up with 5,000 connections again, just with a "Proxy" sitting in the middle watching it happen.

Table Partitioning (Internal Scaling)

When a table grows to hundreds of millions or billions of rows, performance can degrade due to index size, cache inefficiency, and increased traversal depth. Partitioning solves this by splitting one "Logical" table into multiple "Physical" child tables.

Internals: How it Works

You define a Partition Key (e.g., created_at or tenant_id). When a query comes in, query optimizer looks at your where condition, and if the filter matches the partition key, optimizer finds "boundaries" from internal system catalogs(e.g., Partition A holds values from 1 to 100). The end result:

Without Partitioning: Postgres must scan a massive index or the entire 1TB table.
With Partitioning: If you query for "January 2024," Postgres instantly ignores all other months and only looks at the small 10GB slice for January.

This translates to:

Vacuum Performance: VACUUM works on individual partitions. Smaller tables mean faster vacuuming and less "bloat" risk.
Memory Efficiency: Since you are only querying a specific partition, the "hot data" fits more easily into your shared_buffers.

Sharding (Application Level)

While partitioning split a "table" on one server, you can place some partition logic on your application which split the data across multiple physical servers. Say, if the country is S.Korea, you hit DB 1.

You may be wondering then, "what if transaction across multiple physical servers are required?"

The answer to that is it’s generally best to avoid cross-shard transactions unless absolutely necessary. While extensions like Citus can support two-phase commit, it increases the latency since it requires multiple rounds of communications. Plus, during "prepare" phase, records are locked across. So if you have long running transactions, you can cause massive traffic load.

Mohamed Shady 1mo

Scaling PostgreSQL isn’t about some secret feature — it’s more about how you design around it. You start simple: add read replicas to handle traffic, use caching (like Redis) to take pressure off the database, and make sure your queries are actually efficient. That alone gets you pretty far. Once writes become the bottleneck, that’s when things get interesting — you move into sharding, where you split data across multiple databases (often by user ID) and let the application decide where data lives. So the real answer is: you don’t “scale PostgreSQL” directly — you scale your architecture around it. That’s what made stacks like Instagram’s look simple on the surface but incredibly powerful underneath.

Scaling PostgreSQL

Migo Lee

Vertical Scaling

Read Replication

Recommended by LinkedIn

Connection Pooling

Table Partitioning (Internal Scaling)

Internals: How it Works

Sharding (Application Level)

More articles by Migo Lee

Others also viewed

PostgreSQL 18: Ushering in the Future of Relational Databases with Revolutionary Innovations

PostgreSQL Uncovered: Why It Powers the World’s Data

May 2024 Roundup: TDE and High Availability - Must haves for Mission Critical Database

Hash Join Spill Signatures in PostgreSQL

Three MongoDB Diagnostic Commands- Everyone Should Know!

MongoDB vs PostgreSQL — How I Finally Made the Decision?

Partitioning in Azure DB for PostgreSQL

How Far Can PostgreSQL Really Scale? Lessons from Operating at Massive Read Scale

Deep Inside Postgres: Processes, Forking, and the Memory Trade-off

From SQL Server to Postgres: Why We Chose Babelfish for PostgreSQL at MediBuddy

Explore content categories

Vertical Scaling

Read Replication

Recommended by LinkedIn

Connection Pooling

Table Partitioning (Internal Scaling)

Internals: How it Works

Sharding (Application Level)

More articles by Migo Lee

Channel BackPressure Deadlock

Multi-Raft Architecture On RocksDB

mTLS: Cryptographic Trust in Distributed Systems

Object Safety, Simply(?) Explained

Understanding Future Bloat and Stack Overflows in Async Rust

Spurious Wakeup: When OS Pull Pranks On Your Async Runtime

At the Fork in the Road: Rust's .await vs. spawn Under the Hood

Does a Timeout Drop the Network Connection?

Rust IPC from Scratch: Mmap, Atomics, and Memory Layouts

Moving Bytes Without Touching Them: How Standard I/O, mmap and Zero-Copy Interplay

Others also viewed

PostgreSQL 18: Ushering in the Future of Relational Databases with Revolutionary Innovations

PostgreSQL Uncovered: Why It Powers the World’s Data

May 2024 Roundup: TDE and High Availability - Must haves for Mission Critical Database

Hash Join Spill Signatures in PostgreSQL

Three MongoDB Diagnostic Commands- Everyone Should Know!

MongoDB vs PostgreSQL — How I Finally Made the Decision?

Partitioning in Azure DB for PostgreSQL

How Far Can PostgreSQL Really Scale? Lessons from Operating at Massive Read Scale

Deep Inside Postgres: Processes, Forking, and the Memory Trade-off

From SQL Server to Postgres: Why We Chose Babelfish for PostgreSQL at MediBuddy

Explore content categories