Understanding Distributed Locking with Redis: Practical Applications and Challenges
In many cases, especially in distributed systems, we encounter scenarios where we need to ensure that certain operations, like making a REST call to an external API, are performed only once. Consider an example where an external API is not idempotent: the first call returns response R1, and the second call returns R2. If your business logic relies on consistency, these different responses can cause the logic to fail.
In such scenarios, we want to ensure that the API is called only once, even in a distributed environment. This requires the use of distributed locks. While locks can be implemented at the database level, this may introduce additional latency. If you're using Redis, distributed locking can be achieved more efficiently with Redis, as we’ll explore in this article.
Why Do We Need Locking?
Redis in Production
In production, Redis is typically used in two primary configurations:
In both of these configurations, locks can be implemented using Redis’s SET NX EX command.
Redis Locking with SET NX EX
The Redis SET command with the NX and EX options provides an atomic way to set locks. Here's how you can use it:
// Try to acquire the lock with SET NX EX (10 seconds expiration)
String result = jedis.set(lockKey, lockValue, "NX", "EX", 10);
This combines the logic of setting a lock and ensuring it expires after a certain period in one atomic operation.
Fault Tolerance Issues with SET NX EX
While SET NX EX is useful, it isn’t fault-tolerant in distributed Redis topologies. For instance, if the master node holding the lock crashes, the lock may not have been replicated to the read replicas. When one of the replicas is promoted to master, it has no record of the locks that were held. This allows other clients to acquire the same lock again, leading to potential inconsistencies.
To address this, Redis suggests certain patterns like GETSET, but they aren’t foolproof.
Redlock Algorithm: A Distributed Locking Solution
To handle scenarios where locks may be lost due to failures, Redis introduced the Redlock algorithm.
Example: Redlock in Production
1. Independent Redis Instances:
Recommended by LinkedIn
You would run multiple independent Redis instances (not replicas or clustered) across different servers, e.g., Redis-1, Redis-2, Redis-3.
2. Lock Acquisition:
3. Releasing the Lock:
Thus, locking and unlocking in Redlock are quorum-based. There are different libraries like
Redisson, Redsync etc. which provide implementation for RedLock.
Problems with Redlock
Although Redlock is a popular algorithm for distributed locking, it has its own limitations. Martin Kleppmann highlights these in his article. One key issue is that if the client holding the lock is paused (e.g., due to a garbage collection pause), the lock may expire, but the client may still make unsafe changes, thinking it holds the lock. This bug is not theoretical: HBase used to have this problem . Normally, GC pauses are quite short, but “stop-the-world” GC pauses have sometimes been known to last for several minutes [5] – certainly long enough for a lease to expire.
Example:
Imagine a client is paused for an extended period due to garbage collection. The lock expires during this pause, and another client acquires the lock. Once the paused client resumes, it might unknowingly make changes based on the assumption that it still holds the lock.
The Solution: Fencing Tokens
To prevent the scenario described above, a fencing token can be used. A fencing token is a number that increments every time a client acquires the lock. The storage service validates the fencing token with each write request. This ensures that an expired or resumed client cannot make changes after its lock has been overtaken by another client with a higher fencing token.
RedLock does not have any facility for generating fencing tokens.
Summary