System Design Interview - All about Consistent Hashing With Examples

Kartik S.

Published Apr 1, 2023

In the last article “Hash based data distribution and Intro to Consistent Hashing” we have seen the need of consistent hashing and how it may help in case of addition and deletion of a server.

Lets see how consistent hashing actually works with an example. We will cover following aspects -

How data is stored in consistent hashing and how hash rings are formed?
What happens when a new server is added?
What happens when a server is down?
Problem of non uniform data distribution and concept of vnodes or virtual nodes.

Question 1) - How data is stored in consistent hashing and how hash rings are formed?

Suppose we have 3 servers namely S0, S1, S2, we will calculate Hash_function(server_ip) because servers are uniquely identified by their server_ip. Suppose the Hash_function(server_ip) calculated values are as follows in sorted order -

Now we want to distribute a set of usernames = ["Kartik", "Kaivalya", "Karan", "Megha", “Isha”] among them.The steps will be as follows -

Calculate value = Hash_function(user_name). Here we are not taking modulo.
Store the value on the server which has Hash_function(server_ip) just greater than or equal to the value.
The value greater than the greatest Hash_function(server_ip)in this case S2 will be stored on Server S0 hence forming a ring.

For example -

We have the following Hash_function(user_name) for user_name, lets see the distribution using consistent hashing.

Here Kartik will be stored on S0.
Isha and Karan will be stored on S1
Megha will be stored on S2.
Kaivalya(value = 400 > 363 of last server S2) so this username will be stored to S0 forming a ring-like structure.

This is how Consistent hash ring is formed and data is stored on it.

Question 2) - What happens when a new server is added?

Suppose we have added a new server, say S4 with Hash_function(server_ip) = 200 i.e. the server is b/w S0 and S1. Now what will happen is the usernames with Hash_function(user_name) that lies b/w Hash_function(S0) and Hash_function(S4) will be transferred to S4.

So here username = isha with Hash_function(user_name) = 115 was initially stored at S1, now it will be transferred from S1 to S4 and the configuration will look like above.

Recommended by LinkedIn

SYSTEM DESIGN BASICS #5 :- CONSISTENT HASHING

Sambhav Dalal 7 months ago

Rebalancing the Partitions | Strategies for Rebalancing

Saurav Prateek 3 years ago

Confusion between Capabilities and Features

Nick Malik 7 years ago

So you can see the advantage that other servers and the data isn’t even touched which was not the case in simple hash based data distribution.

Question 3) - What happens when a server is down?

Suppose in the above configuration, the Server S1 with Hash_function(server_ip) = 242 goes down. So in this case all the usernames with Hash_function(user_name) b/w Hash_function(S4) and Hash_function(S1) will be now transferred to S2 as S2 is the clockwise next server to S1 on the hash ring .

The configuration will look somewhat like below -

Here again you can observe that other usernames and servers are untouched thus handling both deletion and addition case more efficiently than simple hash based data distribution.

Problem - So we have solved the problem of data redistribution but there is another problem here, can you guess what it is?

Will the data distribution be uniform or non uniform in consistent hashing above?

Suppose we have 10 usernames, and Hash_function(user_name) gives value lesser than 112 i.e. Hash_function(S0), so now you will see that all these 10 usernames will be stored on Server S0 hence giving the unbalanced or kinda skewed data distribution.

For this the concept of virtual nodes or vnodes comes into picture.

For simplicity let's think of vnodes as follows - Each database server or node is assigned some vnodes, the hash value of those vnodes is such that the vnodes are distributed uniformly across the ring.

So now if 5 usernames have hash_function(usernames) values such that 20, 30, 40, 70, 80.

So u1, u2, u3 will be stored on vnode S11 i.e. S1 and u4, u5 will be stored on vnode S00 i.e. S0, without vnode concept all these values will be stored on S0. Hence you can see that introducing the vnode concept has actually helped in uniform distribution of data.

This was all about consistent hashing explained in the easiest way with the help of examples and diagrams. if you like the article, put a like, repost the article and subscribe to my newsletter.

Cheers

Kartik Sapra

#systemdesign #programming #amazon #microsoft #google #softwaredevelopement #softwaredev #faang #maang #techinterview #systemarchitect #systemarchitecture #interviewpreparation #technicalwriter #technicalwriting

System Design For Interviews

7,560 followers

+ Subscribe

Abhay Tiwari 1y

Cfbr

Sushant Kadlag 2y

Yes it helps in resume

KRISHNAN N NARAYANAN 2y

Great opportunity

Khushboo Kumari 3y

Thanks for sharing

Yatin Ahuja 3y

Reach++

1 Reaction

See more comments

To view or add a comment, sign in

System Design Interview - All about Consistent Hashing With Examples

Kartik S.

Recommended by LinkedIn

System Design For Interviews

7,560 followers

More articles by Kartik S.

Others also viewed

Building a Facial Recognition System

API Interview Questions and Answers

3 Steps to Application Migration Success

FoxPro Replacement and End of Life Migrations

What does a Technical Architect do?

Mastering System Design Principles: Key Guidelines for Success

The Production Incident at Apple Inc. that Made Me Build LogClaw

System Design: Understanding Idempotency and Idempotency Keys

A Costly mistake, 2 months of silence & one question that could have prevented all of it.

Explore content categories

Recommended by LinkedIn

System Design For Interviews

7,560 followers

More articles by Kartik S.

System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

System Design Interviews - CAP Theorem Made Easy

Foundations Of Highly Available System Design - Data Replication And Replication Strategies

Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

Google Summer Of Code Student Application Phase

Others also viewed

Building a Facial Recognition System

API Interview Questions and Answers

3 Steps to Application Migration Success

FoxPro Replacement and End of Life Migrations

What does a Technical Architect do?

Mastering System Design Principles: Key Guidelines for Success

The Production Incident at Apple Inc. that Made Me Build LogClaw

System Design: Understanding Idempotency and Idempotency Keys

A Costly mistake, 2 months of silence & one question that could have prevented all of it.

Explore content categories