Your cache will fail. The reason is the same whether it's a recursive function or a CDN.

Caching: The Same Idea Running at Five Different Scales

Memoization in dynamic programming and Redis in production are the same idea. Both trade space for time. Both fail the same way.

Most engineers learn these as separate things — one in an algorithms course, one in a systems course. They are not separate. The pattern is identical. The scale is different. The failure modes are identical at every level. If you have seen a cache fail once, you already understand why it fails everywhere else.

This article traces caching from its mathematical root to its production form. At each level, the same tradeoff appears. At each level, the same failure mode is waiting.

Level 1 — The algorithm root: memoization in dynamic programming

You have a recursive function that computes the same subproblem multiple times. The naive implementation recomputes each subproblem from scratch. Memoization stores each result the first time it is computed. Every subsequent call returns the stored result instead of recomputing.

This is the entire idea. Store the result of expensive work. Return the stored result when asked again.

// Without caching: O(2^n) — recomputes every subproblem
function fib(n):
    if n <= 1: return n
    return fib(n-1) + fib(n-2)

// With caching: O(n) — each subproblem computed once
cache = {}
function fib_cached(n):
    if n in cache: return cache[n]
    if n <= 1:
        cache[n] = n
    else:
        cache[n] = fib_cached(n-1) + fib_cached(n-2)
    return cache[n]

The tradeoff is AT4 — Precomputation vs On-Demand. You are choosing to spend memory to avoid spending compute. The first call is still expensive. Every subsequent call is free.

The failure mode appears immediately: the cache can grow without bound. In a recursive algorithm that calls itself with N distinct inputs, the cache holds N entries. At small N this is fine. At large N this is FM3 — Unbounded Resource Consumption. Every production caching system exists to solve this problem with an eviction policy.

Level 2 — The infrastructure pattern: caching in web applications

The same idea moves from memory to a dedicated cache layer. Your application computes an expensive result — a database query, an API call, a rendered page. It stores the result in Redis or Memcached. Subsequent requests return the stored result without touching the database.

The code looks nearly identical to memoized recursion:

function get_user_profile(user_id):
    cache_key = "profile:" + user_id

    // Check cache first
    cached = redis.get(cache_key)
    if cached exists:
        return cached

    // Cache miss — compute and store
    profile = database.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.set(cache_key, profile, ttl=300)  // expire after 5 minutes
    return profile

The tradeoff is still AT4. The first request pays the database cost. All subsequent requests pay only the Redis lookup cost — typically under 1ms versus 50–200ms for a database query.

Two properties appear here that did not exist at the algorithm level.

The first is TTL — time to live. The memoized function cache lives as long as the program runs. A Redis entry expires after N seconds. This is eviction policy made explicit: after 300 seconds, the entry is gone. The next request pays the database cost again and repopulates the cache.

The second is cache invalidation. When the underlying data changes, the cache holds stale data. Eviction by TTL is a blunt instrument — it guarantees eventual consistency at the cost of a fixed staleness window. Explicit invalidation is precise but requires coordinating the cache update with the database write.

The failure mode that appears at this level: FM4 — Data Consistency Failure. A user updates their profile. The database reflects the update immediately. The cache holds the old profile for the next 300 seconds. Every request during those 300 seconds sees stale data. Whether this is acceptable depends on the application. For a payment balance, it is not acceptable. For a profile picture, it probably is.

Level 3 — The distributed pattern: caching across multiple nodes

A single Redis instance handles around 100,000 operations per second. A large application exceeds this. The cache layer must scale horizontally — multiple cache nodes, each holding a partition of the data.

This is where consistent hashing enters. Each cache node is assigned a position on a ring. Each cache key maps to a position on the same ring. A key is stored on the first node clockwise from its position.

function get_cache_node(key, nodes):
    // Hash the key to a position on the ring
    position = hash(key) % RING_SIZE

    // Find the first node clockwise from this position
    for node in sorted_by_position(nodes):
        if node.position >= position:
            return node

    // Wrap around — return the first node on the ring
    return nodes[0]

This solves the redistribution problem. When a node is removed, only the keys between it and its predecessor move. Everything else stays. The naive alternative — hash(key) % N — requires redistributing every key when N changes.

The failure mode that appears at this level: FM6 — Hotspotting. If a small number of keys are extremely popular — a viral post, a trending product, a celebrity profile — those keys land on a small number of nodes. Those nodes receive a disproportionate fraction of all requests.

The mitigation is replication: store popular keys on multiple nodes, or add virtual nodes to the ring to distribute load more evenly.

Level 4 — The production failure: the thundering herd

This is FM7 — Thundering Herd. It is the failure mode specific to caching that has no equivalent at the algorithm level.

It works like this. A popular cache entry expires. At the moment of expiry, thousands of requests are in flight — all requesting the same key. All find a cache miss. All go to the database simultaneously. The database, designed to handle a fraction of that load because the cache was supposed to absorb it, is overwhelmed. It slows down. Some requests time out. Some fail.

The cache repopulates from the first successful database query. But in the minutes it took to recover, the system was degraded for every user.

// The problem — happens when a popular key expires
function get_product(product_id):
    cached = redis.get("product:" + product_id)
    if cached: return cached

    // 10,000 concurrent requests all reach this line simultaneously
    // All go to the database
    product = database.query(product_id)  // database overwhelmed
    redis.set("product:" + product_id, product, ttl=300)
    return product

// One mitigation — probabilistic early expiration
function get_product_safe(product_id):
    entry = redis.get_with_ttl("product:" + product_id)

    // Before the entry expires, a small fraction of requests
    // decide to refresh it early — based on remaining TTL and
    // the cost of recomputation
    if entry.ttl < threshold and random() < refresh_probability:
        product = database.query(product_id)  // only one request does this
        redis.set("product:" + product_id, product, ttl=300)
        return product

    if entry.value: return entry.value

    // True cache miss — use a distributed lock to allow
    // only one request to query the database
    with distributed_lock("refresh:product:" + product_id):
        product = database.query(product_id)
        redis.set("product:" + product_id, product, ttl=300)
    return product

The thundering herd is not a failure of the cache itself. It is a failure of coordination — the cache and the database are not designed as a system. The cache layer assumes it will always be warm. The database layer assumes it will not receive the full request volume. Neither assumption holds at the moment of expiry.

Level 5 — The architectural pattern: precomputed feeds and CDNs

The same idea operates at the largest scale. A social media feed for a user with 10 million followers cannot be assembled at request time. Aggregating and ranking 10 million posts in milliseconds is not possible on demand. The solution is precomputation: build the feed in the background and cache the result. The request retrieves a prebuilt feed from cache rather than assembling it on demand.

A CDN operates identically. The origin server holds the canonical copy of a file. The CDN nodes hold cached copies distributed geographically. A request from London hits the London CDN node, not the origin server in Virginia. The CDN node either returns the cached copy (cache hit) or fetches it from origin and stores it (cache miss). The cache miss latency is the London-to-Virginia round trip — paid once, not on every request.

The tradeoff is still AT4. The failure mode is still cache consistency — a file updated on the origin takes time to propagate to every CDN node. The TTL governs the staleness window.

What changes at each level is not the pattern but the cost of a cache miss. In dynamic programming, a cache miss costs microseconds of CPU. In a web application, a cache miss costs a database query. In a distributed cache, a cache miss causes hotspotting under load. In a CDN, a cache miss causes geographic latency. In a precomputed feed, a cold cache serves a user a degraded experience.

The pattern is the same. The stakes increase with scale.

What this means for your system right now

Every caching decision you make is a version of the same tradeoff: AT4 — Precomputation vs On-Demand. You are choosing to spend resources now to avoid spending more resources later. The questions that determine whether the tradeoff is correct are always the same:

What is the hit rate? A cache with a 50% hit rate absorbs half your database load. A cache with a 99% hit rate makes your database nearly irrelevant. The hit rate determines whether caching is worth its operational complexity.

What is the acceptable staleness? A TTL of 300 seconds means users see data that is up to five minutes old. For some data this is fine. For account balances or inventory counts, it is not.

What happens when the cache is cold? This is the question most engineers skip. A cache that starts empty, or that has just been cleared, sends every request to the database simultaneously. Your database must be able to survive the full load of a cold cache. If it cannot, you need cache warming — pre-populating the cache before traffic arrives.

What happens when the cache node fails? Every request that was hitting the cache now hits the database. If your database cannot handle that load, the cache node failure cascades into a database failure. Design the database tier to survive cache failure, or accept that a cache node failure will degrade the service.

The signal that tells you this applies to your system

Your P99 latency is much higher than your P50 latency. Your database is the bottleneck but receives fewer requests than your service handles. These two observations together indicate a cache layer is already doing the work. A cold start, a node failure, or a mass expiry would expose the underlying fragility immediately.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Concept:      Caching — storing expensive results to avoid recomputation
Thread:       T5 — Caching: memoization (Book 1) → Redis (Book 3) → CDN (Book 4)
Core Idea:    Every cache trades space for time. The pattern is identical at every scale.
              The failure modes are identical too.
Tradeoff:     AT4 — Precomputation vs On-Demand — spend memory to avoid spending compute.
              The tradeoff compounds with scale: higher hit rates justify more memory;
              higher request rates make cold starts more dangerous.
Failure Mode: FM7 — Thundering Herd — when a popular cache entry expires and thousands
              of requests hit the origin simultaneously. Also FM4 — Data Consistency
              Failure — when the cache holds stale data after the underlying data changes.
Signal:       P99 latency is 10× or more above P50. Database query rate is far below
              total request rate. The cache is working — until it isn't.
Series:       Book 2, Ch 11 (Memoization Is Caching) — full treatment with exercises,
              AT/FM mapping, and the thread from algorithm to production system.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The full framework treatment — compression blocks, three-level exercises, and the complete caching thread from memoization through distributed cache design — is in Book 2: Algorithm Engineering, Chapter 11: Memoization Is Caching. Free chapter available at computingseries.com.

Your cache will fail. The reason is the same whether it's a recursive function or a CDN.

Harish Kumar

Caching: The Same Idea Running at Five Different Scales

Level 1 — The algorithm root: memoization in dynamic programming

Level 2 — The infrastructure pattern: caching in web applications

Level 3 — The distributed pattern: caching across multiple nodes

Level 4 — The production failure: the thundering herd

Level 5 — The architectural pattern: precomputed feeds and CDNs

What this means for your system right now

The signal that tells you this applies to your system

More articles by Harish Kumar

Explore content categories

Caching: The Same Idea Running at Five Different Scales

Level 1 — The algorithm root: memoization in dynamic programming

Level 2 — The infrastructure pattern: caching in web applications

Level 3 — The distributed pattern: caching across multiple nodes

Level 4 — The production failure: the thundering herd

Level 5 — The architectural pattern: precomputed feeds and CDNs

What this means for your system right now

The signal that tells you this applies to your system

More articles by Harish Kumar

Confession of a Coding Agent

The Dirty Secret of AI Coding: Your AI Forgets Everything Every 2 Hours

The nine frameworks every software engineer needs before asking AI anything

Explore content categories