System Design Interview - All about Consistent Hashing With Examples

System Design Interview - All about Consistent Hashing With Examples

In the last article “Hash based data distribution and Intro to Consistent Hashing” we have seen the need of consistent hashing and how it may help in case of addition and deletion of a server. 

Lets see how consistent hashing actually works with an example. We will cover following aspects - 

  1. How data is stored in consistent hashing and how hash rings are formed? 
  2. What happens when a new server is added? 
  3. What happens when a server is down?
  4. Problem of non uniform data distribution and concept of vnodes or virtual nodes.

Question 1) - How data is stored in consistent hashing and how hash rings are formed? 

Suppose we have 3 servers namely S0, S1, S2, we will calculate Hash_function(server_ip) because servers are uniquely identified by their server_ip. Suppose the Hash_function(server_ip) calculated values are as follows in sorted order - 

No alt text provided for this image

Now we want to distribute a set of usernames = ["Kartik", "Kaivalya", "Karan", "Megha", “Isha”] among them.The steps will be as follows - 

  1. Calculate value =  Hash_function(user_name). Here we are not taking modulo. 
  2. Store the value on the server which has Hash_function(server_ip) just greater than or equal to the value.
  3. The value greater than the greatest Hash_function(server_ip)in this case S2 will be stored on Server S0 hence forming a ring. 

For example -

We have the following Hash_function(user_name) for user_name, lets see the distribution using consistent hashing. 

No alt text provided for this image

  • Here Kartik will be stored on S0. 
  • Isha and Karan will be stored on S1
  • Megha will be stored on S2.
  • Kaivalya(value = 400 > 363 of last server S2) so this username will be stored to S0 forming a ring-like structure. 

This is how Consistent hash ring is formed and data is stored on it. 

No alt text provided for this image

Question 2) - What happens when a new server is added? 

Suppose we have added a new server, say S4 with Hash_function(server_ip) =  200 i.e. the server is b/w S0 and S1. Now what will happen is the usernames with Hash_function(user_name) that lies b/w Hash_function(S0) and Hash_function(S4) will be transferred to S4. 

No alt text provided for this image

So here username = isha with Hash_function(user_name) = 115 was initially stored at S1, now it will be transferred from S1 to S4 and the configuration will look like above. 

So you can see the advantage that other servers and the data isn’t even touched which was not the case in simple hash based data distribution. 

Question 3) - What happens when a server is down?

Suppose in the above configuration, the Server S1 with Hash_function(server_ip) = 242 goes down. So in this case all the usernames with Hash_function(user_name) b/w Hash_function(S4) and Hash_function(S1) will be now transferred to S2 as S2 is the clockwise next server to S1 on the hash ring .

The configuration will look somewhat like below - 

No alt text provided for this image

Here again you can observe that other usernames and servers are untouched thus handling both deletion and addition case more efficiently than simple hash based data distribution.

Problem - So we have solved the problem of data redistribution but there is another problem here, can you guess what it is? 

Will the data distribution be uniform or non uniform in consistent hashing above? 

Suppose we have 10 usernames, and Hash_function(user_name) gives value lesser than 112 i.e. Hash_function(S0), so now you will see that all these 10 usernames will be stored on Server S0 hence giving the unbalanced or kinda skewed data distribution. 

For this the concept of virtual nodes or vnodes comes into picture. 

For simplicity let's think of vnodes as follows - Each database server or node is assigned some vnodes, the hash value of those vnodes is such that the vnodes are distributed uniformly across the ring. 

No alt text provided for this image

So now if 5 usernames have hash_function(usernames) values such that 20, 30, 40, 70, 80.

So u1, u2, u3 will be stored on vnode S11 i.e. S1 and u4, u5 will be stored on vnode S00 i.e. S0, without vnode concept all these values will be stored on S0. Hence you can see that introducing the vnode concept has actually helped in uniform distribution of data.

This was all about consistent hashing explained in the easiest way with the help of examples and diagrams. if you like the article, put a like, repost the article and subscribe to my newsletter.

Cheers

Kartik Sapra

#systemdesign #programming #amazon #microsoft #google #softwaredevelopement #softwaredev #faang #maang #techinterview #systemarchitect #systemarchitecture #interviewpreparation #technicalwriter #technicalwriting

To view or add a comment, sign in

More articles by Kartik S.

Others also viewed

Explore content categories