PostgreSQL Double‑Layer Caching

1. What is double‑layer caching?

PostgreSQL stores data in 8 kB blocks. When you run a query, the database needs those blocks in memory. PostgreSQL has its own cache – shared_buffers. But it also relies on the operating system’s page cache. So every block can live in two places at the same time inside RAM. That’s the double‑layer cache.

L1 – shared_buffers

In‑process, managed by PostgreSQL’s buffer manager
Directly accessible for queries, writes, checksums

L2 – OS page cache

Managed by the kernel (Linux: page cache)
Contains file pages from the data directory, including heap tables, indexes, WAL

Together they look like this:

Figure – The two caching layers. A block can be present in both, wasting RAM, but the second layer can also serve a read without touching disk.

2. Inside shared_buffers – the database’s own cache

shared_buffers is a fixed‑size area of memory PostgreSQL allocates at startup.

When PostgreSQL wants a page, it needs:

Which table?
Which fork?
Which block inside that fork?

So it identifies a page like this:

(relation, fork, block_number)

A relation fork =a specific file of a table used for a specific purpose

Types of forks (super simple)

For one table (users), PostgreSQL may have:

1. main fork (most important)

Stores actual data (rows)

This is what you normally think of as the table

2. FSM fork (Free Space Map)

Tracks which pages have empty space
Helps PostgreSQL know: “Where can I insert new data?”

3. VM fork (Visibility Map)

Tracks which pages are “clean” (no dead rows)

Helps optimize queries (like avoiding unnecessary checks)

When a query needs a block, the buffer manager:

Computes a hash from the relation fork + block number
Looks up the buffer in the buffer table
If found → hit → returns pointer (no copy)
If not → miss → allocates or evicts a victim page, reads the block from the kernel

The shared_buffers parameter controls its size. A typical starting point is 25 % of system RAM on a dedicated database server.

Important: PostgreSQL never bypasses the kernel; it always reads through the OS page cache.

Clock‑sweep algorithm in detail

PostgreSQL does not use a precise LRU (Least Recently Used) list. Instead it uses a clock‑sweep (also called the “second chance” or “clock” algorithm) to decide which buffer to evict when a new block must be read.

The buffer pool is treated as a circular array. Each buffer descriptor has two important counters:

usage_count – can range from 0 up to a maximum (typically 5). Starts at 1 when a page is first read into the buffer.
refcount – number of backends currently pinning the page; a page with refcount > 0 cannot be evicted.

A global clock hand points to the next candidate buffer, and the eviction logic works like this:

while a victim has not been chosen:
   if buffer's refcount > 0:
       skip (pinned)
   else if buffer's usage_count > 0:
       usage_count -= 1          // "second chance"
   else:
       choose this buffer as victim

   advance the clock hand to the next buffer

This means a frequently accessed page will see its usage_count get incremented (usually when it is scanned, e.g., by a query) and will survive multiple sweeps of the clock hand before it finally reaches 0 and gets evicted. It approximates LRU with low overhead – no need to maintain a global doubly‑linked list and shuffle entries on every access.

Example

Hot page P has usage_count = 4.
Clock hand sweeps past it 3 times, each time decrementing to 3, 2, 1. P remains in memory.
On the fourth pass, usage_count becomes 0. If at that moment no backend is using the page (refcount = 0), P is chosen as victim and evicted.
But if the page is re‑used between sweeps (a query bumps usage_count back to 5), the clock hand keeps skipping it – P effectively stays cached forever as long as it’s regularly touched.

This design is lock‑free for increments (each backend just increments usage_count on its current buffer) and avoids a central LRU lock, which is critical for concurrency in a multi‑backend database.

3. OS page cache – the kernel’s invisible help

Every read() call on a data file goes through Linux’s page cache. So when PostgreSQL asks for block 42 of base/16384/12345, the kernel:

checks its own cache
if found → copies it straight to PostgreSQL’s shared_buffers (extremely fast)
if not → reads from disk, populates the page cache, then copies to shared_buffers

This means a block evicted from shared_buffers may still sit in the OS cache. If PostgreSQL requests it again, it will be served from RAM – no disk I/O, just a kernel copy.

That’s the good side. The bad side: the same 8 kB block can consume 16 kB of RAM – one copy in shared_buffers, one in the page cache. This is the “double‑buffering” waste.

4. Read path – two hits, two misses

Let’s walk through a single block read request.

Figure 2 – A miss in L1 may still be an L2 hit, avoiding physical disk I/O.

L1 hit → ~hundreds of nanoseconds (memory access + lock)
L2 hit → ~microseconds (kernel copy, no disk)
Disk miss → milliseconds

The planner uses effective_cache_size to estimate how much total cache (L1+L2) is available. It does not allocate memory – it’s a hint for the optimizer.

Ring buffers – protecting the cache from sequential scans

Large sequential scans (and VACUUM) would flush all hot pages out of shared_buffers if they used the full pool. PostgreSQL uses a ring buffer – a tiny set of buffers (256 kB by default) – for these operations.

Figure – Ring buffer for a sequential scan. Most blocks end up only in the OS page cache, leaving shared_buffers hot pages untouched.

So a full table scan warms the OS page cache but hardly touches the main buffer pool. Subsequent scans (or lookups that miss the pool) can still enjoy L2 hits.

5. Inspecting what is really cached

Use the pg_buffercache extension to see which tables and indexes occupy shared_buffers:

SELECT c.relname,
       count(*) AS buffers,
       pg_size_pretty(count(*) * 8192) AS size_in_buffer
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
WHERE b.relfilenode IS NOT NULL
GROUP BY c.relname
ORDER BY buffers DESC
LIMIT 10;

To estimate the OS cache, use tools like vmtouch or pgfincore (which calls posix_fadvise).

Summary – key takeaways

PostgreSQL always reads data through the OS page cache; it never bypasses it with direct I/O by default.
A block can reside simultaneously in shared_buffers (L1) and the OS page cache (L2) – this speeds up re‑reads but wastes RAM.
Ring buffers protect shared_buffers from large sequential scans, leaving hot pages in L1 and warming the OS cache.
Tune based on your workload: small hot datasets prefer a big L1; huge datasets with mixed access rely on the OS cache as a scalable second layer.

Double‑layer caching isn’t a flaw – it’s a conscious design choice that gives you two chances to avoid a disk read. Understand it, size it, and your PostgreSQL queries will fly

PostgreSQL Double‑Layer Caching

Jisan Ahmed

1. What is double‑layer caching?

2. Inside shared_buffers – the database’s own cache

Types of forks (super simple)

1. main fork (most important)

2. FSM fork (Free Space Map)

3. VM fork (Visibility Map)

Clock‑sweep algorithm in detail

Recommended by LinkedIn

3. OS page cache – the kernel’s invisible help

4. Read path – two hits, two misses

Ring buffers – protecting the cache from sequential scans

5. Inspecting what is really cached

Summary – key takeaways

More articles by this author

Others also viewed

MongoDB : Industry Use Cases

From CRUD to Compliance: Audit Logging Made Easy with MongoDB Change Streams

Postgres with Superpowers: Building Scalable Backends with Supabase

Clusterpoint: Document-Centric, ACID-Compliant DBaaS

PostgreSQL 18 — Distributed Systems’ Best Friend: Key Features Every Architect Should Know

MongoDB Best Practices : PART I Operational best practices

Mongodb database migration and consolidation using Apache Kafka + debezium connectors

Building a Highly Available PostgreSQL Cluster with Patroni, Etcd, HAProxy, TimescaleDB & PGTune

Query Optimization in MongoDB

MongoDB Import Error "Failed: invalid JSON input"

Explore content categories

1. What is double‑layer caching?

2. Inside shared_buffers – the database’s own cache

Types of forks (super simple)

1. main fork (most important)

2. FSM fork (Free Space Map)

3. VM fork (Visibility Map)

Clock‑sweep algorithm in detail

Recommended by LinkedIn

3. OS page cache – the kernel’s invisible help

4. Read path – two hits, two misses

Ring buffers – protecting the cache from sequential scans

5. Inspecting what is really cached

Summary – key takeaways

The TOAST Technique in PostgreSQL: Handling Large Values Under the Hood

Apr 24, 2026

How Prometheus Finds and Scrapes Metrics from Kubernetes Pods

Apr 22, 2026

The Brilliant Laziness of Build Tools: How Make and Task Work

Apr 19, 2026

Contributing "Loop" at Pathao: A Backend Engineer’s Deep Dive into Distributed Campaign Orchestration at Million Scale

Apr 17, 2026

From Bytecode to Machine Code: How Numba Makes Python Fast

Apr 13, 2026

Cache Coherence Protocols: PLUS vs. Munin — A Deep Technical Dive

Apr 11, 2026

How to Design Good Domain Interfaces (Ports) for Testable Systems

Apr 1, 2026

Configuration Hierarchies at Scale: Defaults, Overrides, and Safe Rollouts

Mar 31, 2026

The Encoded Polyline Algorithm: How Map Routes Are Compressed

Mar 11, 2026

How Modern Kernels Handle Massive Traffic : the use of SO_REUSEPORT

Dec 9, 2025

Others also viewed

MongoDB : Industry Use Cases

From CRUD to Compliance: Audit Logging Made Easy with MongoDB Change Streams

Postgres with Superpowers: Building Scalable Backends with Supabase

Clusterpoint: Document-Centric, ACID-Compliant DBaaS

PostgreSQL 18 — Distributed Systems’ Best Friend: Key Features Every Architect Should Know

MongoDB Best Practices : PART I Operational best practices

Mongodb database migration and consolidation using Apache Kafka + debezium connectors

Building a Highly Available PostgreSQL Cluster with Patroni, Etcd, HAProxy, TimescaleDB & PGTune

Query Optimization in MongoDB

MongoDB Import Error "Failed: invalid JSON input"

Explore content categories