API Rate limiter: A savior!

Pooja Modi

Published May 19, 2024

The story is from 2022 when I was moving ahead in my career from my first company. This was a game-changing interview experience for me that I can never forget: a low-level design interview with Atlassian. I never expected I would be able to clear it, but I did.

So let’s discuss how an API rate limiter works in terms of a distributed system, why it’s important, and what algorithms are used to implement it.

If you are a regular LinkedIn user, you might have noticed that it allows you to send a limited number of connection requests per day. Have you ever wondered why? Well, if you think in terms of scale, this could be a lot of traffic and efficiency could be affected.

Similarly, if you have ever used Twitter API for retweeting, you might have realized it only allows 300 retweets per 15-minute window. This ensures that no single user hogs all the resources leaving other users with no access to the API.

Let’s discuss rate limiters from a system design perspective.

Why Do We Need a Rate Limiter?

We need a rate limiter to manage high traffic within a particular time range so that the system remains available. If the number of requests piles up, it can create a load on the system and lead to the worst nightmare for a distributed system service: Denial of Service. Instead of dropping connections completely and becoming unavailable, it’s preferable to remain available with potentially lower performance.

To limit the rate of usage of certain APIs, an API rate limiter comes to the rescue. Several algorithms are used for limiting API usage, including:

Token Bucket
Leaky Bucket with Token
Fixed Window Counter
Sliding Window Counter
Sliding Window Counter with Timestamp

Popular Algorithms

We are going to discuss the two most popular ones: Token Bucket and Leaky Bucket. The Token Bucket algorithm is fairly simple and is also used by Amazon DynamoDB.

Recommended by LinkedIn

The Developer Productivity Engineer - February 2025

Gradle Technologies 1 year ago

Chronicles from the engine room #1

Angelo Andreetto 4 years ago

Appreciate Generously; It Costs Nothing, But Builds…

Arpit Bhayani 8 months ago

Token-Bucket Algorithm

Bucket Capacity: We assume a bucket filled with tokens, which has a fixed capacity.
Token Consumption: With each request, one token is removed from the bucket.
Token Replenishment: Tokens are replenished at a constant rate after a given period.
Rate Limiting: If all tokens are used up before they can be replenished, the user must wait until the bucket has more tokens available.

This algorithm helps control the rate of requests by ensuring that the number of requests stays within the limit defined by the token replenishment rate.

This is the basic implementation of the algorithm:

# Token Bucket API rate Limiter
# Mostly used by Amazon and Strip to throttle their API requests
# Let's write this in Python today.
# We have one predefined capacity: 5 -> This is the size of the bucket
# Tokens are put into the bucket 
# rate is a preset rate if we try to add within that time frame to the 
# Then you cannot add more items
import time
class Bucket:
    def __init__(self,capacity,refill_rate):
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.last_request_time = time.time() 
        #time now when the bucket was created
        self.tokens_left = capacity
        pass

    def refill(self):
        now = time.time()
        self.tokens_left = self.capacity
        self.last_request_time = now
        pass

    def consume(self,token): 
        now = time.time()
        remaining_time = now - self.last_request_time
        if remaining_time >= self.refill_rate:
            self.refill()
        if self.tokens_left-token >= 0:
            self.tokens_left-=token
            return True
        else:
            return False
        pass
rate_limiter = Bucket(capacity=5, refill_rate = 10)
for i in range(20):
    if rate_limiter.consume(1):
        print("Request_sent")
    else:
        print("Request denied please wait")
    time.sleep(1)

Leaky Bucket Token Algorithm

Imagine a bucket with a small hole at the bottom.

Constant Leak Rate: Tokens leak out of the bucket at a constant rate.
Constant Fill Rate: The bucket is filled with tokens at a constant rate.
Request Handling: As user requests come in, tokens are removed from the bucket.

In this way, the bucket simultaneously fills with tokens and leaks tokens at a constant rate, ensuring a steady and controlled flow of request handling.

This is the basic implementation of the algorithm:


import time
import threading

class LeakyBucket:
    def __init__(self,capacity,leak_rate):
        self.capacity = capacity
        self.tokens_left = 0 # initial token count would be zero
        self.last_request_time = time.time()
        self.leak_rate = leak_rateAPI Rate limiter: A savior!
        self.start_filling()

# Bucket leaks at a constant rate and processes the data
    def leak(self):
        now = time.time()
        time_spent = now - self.last_request_time
        tokens_to_leak = time_spent * self.leak_rate
        self.tokens_left = max(0, self.tokens_left - tokens_to_leak)
        self.last_request_time = now
        pass
# consumer is the API requester, who is hitting the API
    def consume(self,tokens):
        if tokens < self.tokens_left:
            self.tokens_left-=tokens
            return True
        else:
            return False
        pass
    def add_token(self):
        while True:
            self.leak()
            if(self.tokens_left < self.capacity):
                self.tokens_left+=1
            time.sleep(0.5)
        pass
    def start_filling(self):
        bucket_fill = threading.Thread(target=self.add_token, daemon=True)
        bucket_fill.start()



limiter = LeakyBucket(capacity=10, leak_rate= 1) 
# Allow 1 token per second, with a maximum of 10 tokens
# Simulate API requests
for i in range(20):
    if limiter.consume(1):
        print("Request Allowed")
    else:
        print("Request Denied - Rate Limit Exceeded")
    time.sleep(1)  # Simulating some delay between request

If you notice carefully these algorithms are not only used for distributed systems, they have been used for a long time for different networking protocols as well.

I hope this helps you to prepare for your interview. Keep learning, and stay tuned for the next blog on the series.

To view or add a comment, sign in

API Rate limiter: A savior!

Pooja Modi

Why Do We Need a Rate Limiter?

Popular Algorithms

Recommended by LinkedIn

More articles by Pooja Modi

Others also viewed

Building My Talos Homelab part 3

ArcKit v4.1.1: GitHub Copilot, Vendor Scoring, and Blast Radius Analysis

Devoxx UK CFP is Open!

Platters & Plateaus: Improving Gerrit Performance at IMC Trading

Building with Lovable (for non-tech folks)

Being Lazy

Building a RESTful API with Spring Boot: Adding HATEOAS for Better API Navigation

Discover, Notate, Repeat = A Dev's Diary

Elevate Your Code Quality with CodeQL in Local Environments and GitHub Actions!

Bye, Elastic

Explore content categories

Why Do We Need a Rate Limiter?

Popular Algorithms

Recommended by LinkedIn

More articles by Pooja Modi

Consistency, a Distributed System Concept

Caching in Distributed Systems

Hashing and Hash Tables for Interviews

Others also viewed

Building My Talos Homelab part 3

ArcKit v4.1.1: GitHub Copilot, Vendor Scoring, and Blast Radius Analysis

Devoxx UK CFP is Open!

Platters & Plateaus: Improving Gerrit Performance at IMC Trading

Building with Lovable (for non-tech folks)

Being Lazy

Building a RESTful API with Spring Boot: Adding HATEOAS for Better API Navigation

Discover, Notate, Repeat = A Dev's Diary

Elevate Your Code Quality with CodeQL in Local Environments and GitHub Actions!

Bye, Elastic

Explore content categories