API Rate limiter: A savior!
The story is from 2022 when I was moving ahead in my career from my first company. This was a game-changing interview experience for me that I can never forget: a low-level design interview with Atlassian. I never expected I would be able to clear it, but I did.
So let’s discuss how an API rate limiter works in terms of a distributed system, why it’s important, and what algorithms are used to implement it.
If you are a regular LinkedIn user, you might have noticed that it allows you to send a limited number of connection requests per day. Have you ever wondered why? Well, if you think in terms of scale, this could be a lot of traffic and efficiency could be affected.
Similarly, if you have ever used Twitter API for retweeting, you might have realized it only allows 300 retweets per 15-minute window. This ensures that no single user hogs all the resources leaving other users with no access to the API.
Let’s discuss rate limiters from a system design perspective.
Why Do We Need a Rate Limiter?
We need a rate limiter to manage high traffic within a particular time range so that the system remains available. If the number of requests piles up, it can create a load on the system and lead to the worst nightmare for a distributed system service: Denial of Service. Instead of dropping connections completely and becoming unavailable, it’s preferable to remain available with potentially lower performance.
To limit the rate of usage of certain APIs, an API rate limiter comes to the rescue. Several algorithms are used for limiting API usage, including:
Popular Algorithms
We are going to discuss the two most popular ones: Token Bucket and Leaky Bucket. The Token Bucket algorithm is fairly simple and is also used by Amazon DynamoDB.
Recommended by LinkedIn
Token-Bucket Algorithm
This algorithm helps control the rate of requests by ensuring that the number of requests stays within the limit defined by the token replenishment rate.
This is the basic implementation of the algorithm:
# Token Bucket API rate Limiter
# Mostly used by Amazon and Strip to throttle their API requests
# Let's write this in Python today.
# We have one predefined capacity: 5 -> This is the size of the bucket
# Tokens are put into the bucket
# rate is a preset rate if we try to add within that time frame to the
# Then you cannot add more items
import time
class Bucket:
def __init__(self,capacity,refill_rate):
self.capacity = capacity
self.refill_rate = refill_rate
self.last_request_time = time.time()
#time now when the bucket was created
self.tokens_left = capacity
pass
def refill(self):
now = time.time()
self.tokens_left = self.capacity
self.last_request_time = now
pass
def consume(self,token):
now = time.time()
remaining_time = now - self.last_request_time
if remaining_time >= self.refill_rate:
self.refill()
if self.tokens_left-token >= 0:
self.tokens_left-=token
return True
else:
return False
pass
rate_limiter = Bucket(capacity=5, refill_rate = 10)
for i in range(20):
if rate_limiter.consume(1):
print("Request_sent")
else:
print("Request denied please wait")
time.sleep(1)
Leaky Bucket Token Algorithm
Imagine a bucket with a small hole at the bottom.
In this way, the bucket simultaneously fills with tokens and leaks tokens at a constant rate, ensuring a steady and controlled flow of request handling.
This is the basic implementation of the algorithm:
import time
import threading
class LeakyBucket:
def __init__(self,capacity,leak_rate):
self.capacity = capacity
self.tokens_left = 0 # initial token count would be zero
self.last_request_time = time.time()
self.leak_rate = leak_rateAPI Rate limiter: A savior!
self.start_filling()
# Bucket leaks at a constant rate and processes the data
def leak(self):
now = time.time()
time_spent = now - self.last_request_time
tokens_to_leak = time_spent * self.leak_rate
self.tokens_left = max(0, self.tokens_left - tokens_to_leak)
self.last_request_time = now
pass
# consumer is the API requester, who is hitting the API
def consume(self,tokens):
if tokens < self.tokens_left:
self.tokens_left-=tokens
return True
else:
return False
pass
def add_token(self):
while True:
self.leak()
if(self.tokens_left < self.capacity):
self.tokens_left+=1
time.sleep(0.5)
pass
def start_filling(self):
bucket_fill = threading.Thread(target=self.add_token, daemon=True)
bucket_fill.start()
limiter = LeakyBucket(capacity=10, leak_rate= 1)
# Allow 1 token per second, with a maximum of 10 tokens
# Simulate API requests
for i in range(20):
if limiter.consume(1):
print("Request Allowed")
else:
print("Request Denied - Rate Limit Exceeded")
time.sleep(1) # Simulating some delay between request
If you notice carefully these algorithms are not only used for distributed systems, they have been used for a long time for different networking protocols as well.
I hope this helps you to prepare for your interview. Keep learning, and stay tuned for the next blog on the series.