Understanding Distributed Locking with Redis: Practical Applications and Challenges

Suyash Gogte

Published Sep 19, 2024

In many cases, especially in distributed systems, we encounter scenarios where we need to ensure that certain operations, like making a REST call to an external API, are performed only once. Consider an example where an external API is not idempotent: the first call returns response R1, and the second call returns R2. If your business logic relies on consistency, these different responses can cause the logic to fail.

In such scenarios, we want to ensure that the API is called only once, even in a distributed environment. This requires the use of distributed locks. While locks can be implemented at the database level, this may introduce additional latency. If you're using Redis, distributed locking can be achieved more efficiently with Redis, as we’ll explore in this article.

Why Do We Need Locking?

Efficiency: We want to avoid multiple workers performing the same task unnecessarily, saving both time and resources.
Correctness: Locks prevent concurrent processes from interfering with each other, ensuring that system state remains consistent, as illustrated by the example above. Basically, avoid race condition.

Redis in Production

In production, Redis is typically used in two primary configurations:

Single Instance(non-cluster): This is a master-slave setup where one master node handles writes, and multiple read replicas handle reads.
Cluster Enabled: In this setup, Redis is sharded across multiple nodes. Each shard contains a master and read replicas, making clustering useful for scaling large operations.

In both of these configurations, locks can be implemented using Redis’s SET NX EX command.

Redis Locking with SET NX EX

The Redis SET command with the NX and EX options provides an atomic way to set locks. Here's how you can use it:

// Try to acquire the lock with SET NX EX (10 seconds expiration)
String result = jedis.set(lockKey, lockValue, "NX", "EX", 10);

NX ensures the lock is set only if the key doesn’t already exist (i.e., no other client holds the lock).
EX sets an expiration time for the lock, ensuring it doesn’t persist indefinitely.

This combines the logic of setting a lock and ensuring it expires after a certain period in one atomic operation.

Fault Tolerance Issues with SET NX EX

While SET NX EX is useful, it isn’t fault-tolerant in distributed Redis topologies. For instance, if the master node holding the lock crashes, the lock may not have been replicated to the read replicas. When one of the replicas is promoted to master, it has no record of the locks that were held. This allows other clients to acquire the same lock again, leading to potential inconsistencies.

To address this, Redis suggests certain patterns like GETSET, but they aren’t foolproof.

Redlock Algorithm: A Distributed Locking Solution

To handle scenarios where locks may be lost due to failures, Redis introduced the Redlock algorithm.

Example: Redlock in Production

1. Independent Redis Instances:

Recommended by LinkedIn

What is Redis?

Asher Muneer 2 years ago

Spring Boot Caching with Redis

Ahmed Abdelaziz 1 year ago

Demystifying Rate Limiting in Distributed Systems -…

Gaurav Singh 8 months ago

You would run multiple independent Redis instances (not replicas or clustered) across different servers, e.g., Redis-1, Redis-2, Redis-3.

2. Lock Acquisition:

When a client wants to acquire a lock, it sends a SET command to all three Redis instances, asking for the lock with the same key and expiration time.
If the client acquires the lock on a majority of the instances (e.g., 2 out of 3), it is considered as holding the lock.
The lock has a lease time, ensuring it will expire after a certain period even if the client crashes.

3. Releasing the Lock:

The client releases the lock by sending DEL commands to all three instances.

Thus, locking and unlocking in Redlock are quorum-based. There are different libraries like

Redisson, Redsync etc. which provide implementation for RedLock.

Problems with Redlock

Although Redlock is a popular algorithm for distributed locking, it has its own limitations. Martin Kleppmann highlights these in his article. One key issue is that if the client holding the lock is paused (e.g., due to a garbage collection pause), the lock may expire, but the client may still make unsafe changes, thinking it holds the lock. This bug is not theoretical: HBase used to have this problem . Normally, GC pauses are quite short, but “stop-the-world” GC pauses have sometimes been known to last for several minutes [5] – certainly long enough for a lease to expire.

Example:

Imagine a client is paused for an extended period due to garbage collection. The lock expires during this pause, and another client acquires the lock. Once the paused client resumes, it might unknowingly make changes based on the assumption that it still holds the lock.

The Solution: Fencing Tokens

To prevent the scenario described above, a fencing token can be used. A fencing token is a number that increments every time a client acquires the lock. The storage service validates the fencing token with each write request. This ensures that an expired or resumed client cannot make changes after its lock has been overtaken by another client with a higher fencing token.

RedLock does not have any facility for generating fencing tokens.

Summary

Single-node locking: If your goal is efficiency (e.g., to prevent duplicate work), using Redis with SET NX EX on a single node can work well.
Correctness in distributed systems: If you need locks to ensure correctness (i.e., preventing conflicting writes), avoid Redlock. Instead, use a consensus-based system like ZooKeeper or a database with strong transactional guarantees (such as using a fencing token to handle race conditions).

To view or add a comment, sign in

Understanding Distributed Locking with Redis: Practical Applications and Challenges

Suyash Gogte

Why Do We Need Locking?

Redis in Production

Redis Locking with SET NX EX

Fault Tolerance Issues with SET NX EX

Redlock Algorithm: A Distributed Locking Solution

Example: Redlock in Production

Recommended by LinkedIn

Problems with Redlock

Example:

The Solution: Fencing Tokens

Summary

More articles by Suyash Gogte

Others also viewed

🚀 Distributed Caching — Part 1: Foundations & Core Patterns

What Redis Can Do More Than Caching

Redis Is Not Just a Cache

Beyond Caching: How a Cuckoo Filter Helped Me Kill Useless Database Calls in My Project

🚀 Redis Cache: Speed Up Your Applications with In-Memory Data Storage ⚡

Redis as Your Swiss Army Knife

[Original]Maximizing Redis Potential: Beyond Caching in Data-Intensive Applications

Caching is not always in-memory

Explore content categories

Why Do We Need Locking?

Redis in Production

Redis Locking with SET NX EX

Fault Tolerance Issues with SET NX EX

Redlock Algorithm: A Distributed Locking Solution

Example: Redlock in Production

Recommended by LinkedIn

Problems with Redlock

Example:

The Solution: Fencing Tokens

Summary

More articles by Suyash Gogte

Precomputing as a Design Strategy for High-Scale Systems

Performance anomalies are some of the hardest problems to debug in distributed systems.

How Airbnb Evolved From Static Rate Limiting to Adaptive Traffic Management in Mussel

How Google Fixed a Major Flaw in Consistent Hashing

⚡ Understanding gRPC: Streaming and Network-Level Operations

Building a Scalable Top-K Trending System with Freshness

Designing a Scalable System to Find Nearby Services

Hot Restart in Envoy: Zero Downtime Upgrades

Predictive Autoscaling with Machine Learning: Lessons from Coinbase

Netflix's Pushy Notifications: A Deep Dive

Others also viewed

🚀 Distributed Caching — Part 1: Foundations & Core Patterns

What Redis Can Do More Than Caching

Redis Is Not Just a Cache

Beyond Caching: How a Cuckoo Filter Helped Me Kill Useless Database Calls in My Project

🚀 Redis Cache: Speed Up Your Applications with In-Memory Data Storage ⚡

Redis as Your Swiss Army Knife

[Original]Maximizing Redis Potential: Beyond Caching in Data-Intensive Applications

Caching is not always in-memory

Explore content categories