System Design Interview - All about Consistent Hashing With Examples
In the last article “Hash based data distribution and Intro to Consistent Hashing” we have seen the need of consistent hashing and how it may help in case of addition and deletion of a server.
Lets see how consistent hashing actually works with an example. We will cover following aspects -
Question 1) - How data is stored in consistent hashing and how hash rings are formed?
Suppose we have 3 servers namely S0, S1, S2, we will calculate Hash_function(server_ip) because servers are uniquely identified by their server_ip. Suppose the Hash_function(server_ip) calculated values are as follows in sorted order -
Now we want to distribute a set of usernames = ["Kartik", "Kaivalya", "Karan", "Megha", “Isha”] among them.The steps will be as follows -
For example -
We have the following Hash_function(user_name) for user_name, lets see the distribution using consistent hashing.
This is how Consistent hash ring is formed and data is stored on it.
Question 2) - What happens when a new server is added?
Suppose we have added a new server, say S4 with Hash_function(server_ip) = 200 i.e. the server is b/w S0 and S1. Now what will happen is the usernames with Hash_function(user_name) that lies b/w Hash_function(S0) and Hash_function(S4) will be transferred to S4.
So here username = isha with Hash_function(user_name) = 115 was initially stored at S1, now it will be transferred from S1 to S4 and the configuration will look like above.
Recommended by LinkedIn
So you can see the advantage that other servers and the data isn’t even touched which was not the case in simple hash based data distribution.
Question 3) - What happens when a server is down?
Suppose in the above configuration, the Server S1 with Hash_function(server_ip) = 242 goes down. So in this case all the usernames with Hash_function(user_name) b/w Hash_function(S4) and Hash_function(S1) will be now transferred to S2 as S2 is the clockwise next server to S1 on the hash ring .
The configuration will look somewhat like below -
Here again you can observe that other usernames and servers are untouched thus handling both deletion and addition case more efficiently than simple hash based data distribution.
Problem - So we have solved the problem of data redistribution but there is another problem here, can you guess what it is?
Will the data distribution be uniform or non uniform in consistent hashing above?
Suppose we have 10 usernames, and Hash_function(user_name) gives value lesser than 112 i.e. Hash_function(S0), so now you will see that all these 10 usernames will be stored on Server S0 hence giving the unbalanced or kinda skewed data distribution.
For this the concept of virtual nodes or vnodes comes into picture.
For simplicity let's think of vnodes as follows - Each database server or node is assigned some vnodes, the hash value of those vnodes is such that the vnodes are distributed uniformly across the ring.
So now if 5 usernames have hash_function(usernames) values such that 20, 30, 40, 70, 80.
So u1, u2, u3 will be stored on vnode S11 i.e. S1 and u4, u5 will be stored on vnode S00 i.e. S0, without vnode concept all these values will be stored on S0. Hence you can see that introducing the vnode concept has actually helped in uniform distribution of data.
This was all about consistent hashing explained in the easiest way with the help of examples and diagrams. if you like the article, put a like, repost the article and subscribe to my newsletter.
Cheers
Kartik Sapra
Cfbr
Yes it helps in resume
Great opportunity
Thanks for sharing
Reach++