Lazy Proxy - A Pattern for Maximizing Caching Performance

Alpha Diallo

Published Jan 17, 2017

Overview

A common weakness in most caching patterns is the failure to account for the cost of multiple concurrent requests for the same key resulting in a cache miss.

Imagine two scenarios:

An expensive and long running service request to generate a large report.
A request to fetch a small amount of tick data (e.g.: bid/ask, latest sale price, volume, etc...)

In both cases, we can substantially reduce the cost of serving previously fulfilled requests by adding a caching layer at some stage of request-reply pipeline. After the first request for a particular data item, subsequent requests for that same data would return whatever was found in the cache. This is a ubiquitous design pattern, but it does nothing to eliminate the redundant cost of concurrent requests before the response is cached.

In scenario 1: if it takes 1 hour to generate the report, any request for the same report within this window will generate further undue stress on the backend system (database, service, etc...).
In scenario 2: if it takes an average of 100 milliseconds to fetch the tick data, duplicate requests can still slip in before the response is cached. This is not problem if we're only serving 100 requests a day. But it adds up quickly in a large system if we're dealing with 1 million requests per hour (250-300 reqs/second).

We can illustrate the problem with the following sequence diagram.

When multiple concurrent requests each result in a cache miss, the backend data store is condemned to process duplicate requests - the very scenario we wanted to avoid by introducing a cache. This article describes a simple and efficient solution that avoids the wasteful step of processing duplicate requests, along with a POC implementation in C#/.NET. The following diagram illustrates the solution.

Spec

Consolidate concurrent requests (T1, T2, T3, ..., Tn) for the same data item.
Only one thread (T1) will execute the request - other threads (T2, T3, ... Tn) will wait asynchronously and return the result of T1.
If T1 throws and exception or returns an error, all waiting threads will throw or return the same error.
The combined cost (time, cpu, memory, i/o) of handling all consolidated requests should be roughly equivalent to the cost of handling any single request. That is, if a single request T1 has a resource cost R, then, consolidated requests (T1, T2, T3, ..., Tn) should also cost approximately R; not n*R.
After all threads have returned, all references to resources (request, response, key, etc...) should be released.

API

We can implement the spec using a simple API with two types:

An IProxy interface: this describes the client operation we're trying to optimize, i.e.: the request for a large report, or the request for tick data. The client must implement this interface.

public interface IProxy<in TRequest, out TResponse>
{
    Task<TResponse> ProcessAsync(TRequest request);
}

A LazyProxy class: this is the concurrent request consolidator. Clients will execute their cacheable requests through this proxy. The constructor for this type takes a TimeSpan value. Any identical requests that occur within this time span are considered concurrent. The IProxy.ProcessAsync method should return within this time span.

public class LazyProxy<TRequest, TResponse>
{
    public LazyProxy(IProxy<TRequest, TResponse> proxy, TimeSpan cacheExpirationTime);
 
    public async Task<TResponse> ProcessOnceAsync(string key, TRequest request);
}

The ProcessOnceAsync method will be the meat of the implementation.

Implementation

We require two things for a correct implementation of LazyProxy:

For grouping identical requests: a thread safe key-value store. A suitable candidate could be the ConcurrentDictionary type.
To ensure only one thread executes for each group of identical requests: a synchronization mechanism that allows multiple threads to cooperate via signaling. The EventWaitHandle type may be helpful here.

The general algorithm itself can be diagrammed as follows:

As multiple threads execute LazyProxy.ProcessOnceAsync with the same request key, they will race to insert a wait handle into the concurrent dictionary. The thread that succeeds (T1) with the insert operation will proceed to execute IProxy.ProcessAsync, while remaining and subsequent threads block on the WaitHandle. After completing the request, T1 will set the result of the operation, signal the handle and unblock the waiting threads. Every thread will then return the result (or exception) computed by T1.

While not immediately obvious, this a disguised version of lazy initialization. We can swap the wait handles for an async implementation of the Lazy<T> class with the following API:

public class AsyncLazy<T>
{
    public AsyncLazy(Func<T> factory);
    public AsyncLazy(Func<Task<T>> factory);
    public TaskAwaiter<T> GetAwaiter()
}

The secret sauce here is the GetAwaiter method, which allows us to await on the AsyncLazy<T> type.

To further simplify the implementation, we can swap the concurrent dictionary for a MemoryCache. This facilitates releasing resources after request threads have returned: we just expire the key entry from the cache.

A trivial (but correct) implementation of LazyProxy will then look like this:

public class LazyProxy<TRequest, TResponse>
{
     private readonly IProxy<TRequest, TResponse> _proxy;
     private readonly MemoryCache _cache;
     private readonly TimeSpan _cacheExpirationTime;

     public LazyProxy(IProxy<TRequest, TResponse> proxy, TimeSpan cacheExpirationTime)
     {
          _cache = new MemoryCache("LazyProzy.RequestCache");
          _cacheExpirationTime = cacheExpirationTime;
          _proxy = proxy;
     }

     public async Task<TResponse> ProcessOnceAsync(string key, TRequest request)
     {
          var lazy = _cache.AddOrGetExisting(key, new AsyncLazy<TResponse>(() 
                         => _proxy.ProcessAsync(request)), 
                         DateTimeOffset.UtcNow.Add(_cacheExpirationTime)) 
                         as AsyncLazy<TResponse>;

          // the first thread to insert into the cache will get a null value
          if (lazy == null)
          {
               lazy = _cache.Get(key) as AsyncLazy<TResponse>;
          }

          return await lazy;
     }
}

Conclusion

Caching is an essential component in optimizing the performance of most web applications. But it requires careful implementation to maximize its benefits. The technique described here can be adapted both by cache server engineers, and by application developers, regardless of the caching pattern in use (Cache-Aside, Read-Through/Write-Through, etc...).

Source Code

Refer to GitHub for the full source code, with tests and the implementation of AsyncLazy<T>: https://github.com/dialalpha/LazyProxy

Lazy Proxy - A Pattern for Maximizing Caching Performance

Alpha Diallo

Overview

Spec

API

Implementation

Conclusion

Source Code

Others also viewed

Cache Eviction Strategies

How Real-Time Leaderboards Work at Scale - Caching, Events and Databases

Why Caching Matters: A Guide to Speed and Efficiency

Supercharging Your .NET 8 Applications with Output Caching 🚀

Caching: What may go wrong?

Cache Strategies: Ensure Consistency and Speed

ZFS Caching

Cache strategies

📚 The Ultimate Guide to Cache Replacement Algorithms: Types, Pros & Cons

The Hidden Cost of Too Much Caching: A Silent System Killer

Explore content categories