Scaling PostgreSQL
Most performance problems in PostgreSQL don’t come from a single bottleneck - they emerge as your system grows: more data, more traffic, more concurrency. What worked perfectly at 10K rows and a handful of connections begins to break down at millions of rows and thousands of requests per second.
Scaling PostgreSQL isn’t about a single silver bullet. It’s a layered approach, starting from simple vertical scaling and moving toward more sophisticated techniques like replication, connection pooling, partitioning, and eventually sharding. Each step addresses a different kind of limitation - memory pressure, I/O throughput, connection overhead, or dataset size.
This guide walks through those layers, not just from a “what to do” perspective, but from the inside out - how PostgreSQL actually behaves under the hood, and why each technique works (or sometimes doesn’t).
Vertical Scaling
Probably the easiest yet most expensive options. For example, adding more RAM allows you to increase the shared_buffers setting.
shared_buffers is a "global" pool of memory shared by all Postgres processes. It caches data blocks from tables and indexes and when a user runs a query, Postgres first looks here; "a cache hit" returns data instantly.
Similarly, you have wal_buffers which caches transaction logs before they are flushed to disk. It’s important to remember that Postgres has two destinations for persistence: the WAL (the fast journal) and the Data Files (the actual table).
With default setting(-1), Postgres allocates about 1/32nd of your shared_buffers, usually capping it at 16MB. This 16MB cap aligns with the default WAL segment size. While auto-tuning works for most, if you are inserting millions of rows in a single transactions, WAL data generated might exceed 16MB. If the buffer fills up before you commit, Postgres is force to flus to disk mid-transaction, slowing things down.
Therefore, properly sizing shared_buffers and wal_buffers can provide a smoother "pipeline" for write operations.
Read Replication
Most applications are read-heavy. So set up a primary node for all writes (INSERT/UPDATE/DELETE) and one or more replica nodes for reads.
Internals: How does replication go?
Postgres uses Streaming Replication, which is sending a log at granular level.
You might be wondering if this replication is guaranteed to be "consistent" in a sense that all replicas and master have the exact same copy. Well, it depends - by default, it's not.
If you configure Replication Mode and by default, it's set to be asynchronous replication, meaning:
If you set synchronous replication, the flow goes:
So this ensures the replica has durably received the WAL. But still, not necessariliy that it is fully replayed and query-visible. To top it off, if the network is slow or the replica crashes, your writes on the primary will hang or fail.
Recommended by LinkedIn
Connection Pooling
PostgreSQL uses a process-per-connection model, meaning that 5,000 active app threads trying to talk to the DB simultaneously means the creation of 5,000 processes which can overwhelm the system due to memory overhead, context switching, and contention.
A pooler(like PgBouncer, PgCat, or RDS Proxy) acts as a controller. It maintains a small set of "warm" database connections and multiplexes them across thousands of incoming application requests.
Internally, there are three strategies:
Apparently, Transaction pooling takes some sweet spot with one down side: You cannot use certain features like SET variables or session state(such as temp tables, advisory locks, cursors) because the next time you run a query, you might be on a totally different physical connection. So, RDS proxy and PgBouncer use a fallback called Session Pinning.
If the pooler detects a state-changing command (like a SET variable or a temporary table), it effectively "demotes" that connection from Transaction mode to Session mode. It pins that specific database connection to that specific client until they disconnect.
Of course, if your application code "pins" every connection, your pooler becomes useless. You'll end up with 5,000 connections again, just with a "Proxy" sitting in the middle watching it happen.
Table Partitioning (Internal Scaling)
When a table grows to hundreds of millions or billions of rows, performance can degrade due to index size, cache inefficiency, and increased traversal depth. Partitioning solves this by splitting one "Logical" table into multiple "Physical" child tables.
Internals: How it Works
You define a Partition Key (e.g., created_at or tenant_id). When a query comes in, query optimizer looks at your where condition, and if the filter matches the partition key, optimizer finds "boundaries" from internal system catalogs(e.g., Partition A holds values from 1 to 100). The end result:
This translates to:
Sharding (Application Level)
While partitioning split a "table" on one server, you can place some partition logic on your application which split the data across multiple physical servers. Say, if the country is S.Korea, you hit DB 1.
You may be wondering then, "what if transaction across multiple physical servers are required?"
The answer to that is it’s generally best to avoid cross-shard transactions unless absolutely necessary. While extensions like Citus can support two-phase commit, it increases the latency since it requires multiple rounds of communications. Plus, during "prepare" phase, records are locked across. So if you have long running transactions, you can cause massive traffic load.
Scaling PostgreSQL isn’t about some secret feature — it’s more about how you design around it. You start simple: add read replicas to handle traffic, use caching (like Redis) to take pressure off the database, and make sure your queries are actually efficient. That alone gets you pretty far. Once writes become the bottleneck, that’s when things get interesting — you move into sharding, where you split data across multiple databases (often by user ID) and let the application decide where data lives. So the real answer is: you don’t “scale PostgreSQL” directly — you scale your architecture around it. That’s what made stacks like Instagram’s look simple on the surface but incredibly powerful underneath.