System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

Kartik S.

Published Mar 19, 2023

In this article, I will be covering following topics -

Hash based distribution of data among the database servers.
Limitations of simple hash based data distribution in case of addition/removal of database server.
Introduction to consistent hashing and how it can help counter the limitations of simple hash based data distribution.

Suppose we have 4 database servers namely S0, S1, S2, S3. Now we want to distribute set of usernames = ["Raj", "Kartik", "Kaivalya", "Karan", "Isha", "Megha"] among them.

One of the simple approach here is to use simple hash based data distribution , in this approach the server number assigned to a username will be computed as follows -

server number = hashfunc(username) % number_of_servers

Note - Hash function used should be deterministic i.e. For a specific information value, a hash function should continuously provide the same hash value.

Let us suppose following are the values given by our hashfunc for each username -

Now the distribution of usernames among servers will be as follows -

As our hash function used is deterministic, we can easily retrieve the data from correct server by using the same formulae

server number = hashfunc(username) % number_of_servers .

Question - But what if we scale up(add another database server) or scale down(remove/failure a database server) ?

Case 1 - Server goes down/removed

Lets suppose one of the server goes down, now number_of_servers = 3.

Case 2 - Adding a new database server S4 so now number_of_servers = 5.

Disadvantages -

You can observe that addition/removal of new server led to drastic data redistribution which is highly inefficient as in worst case all the data/usernames may need to be redistributed and that can be millions of records.
In the meantime of redistribution, there will be massive read misses, rendering system non-available.

Here comes the consistent hashing in picture.

Suppose the servers are placed like the orientation below -

For now just ignore what this ring is and how servers are placed on this ring and how data is distributed. Just note that -

Recommended by LinkedIn

Overview of database concepts and architecture

Mümin Çiçek 6 years ago

📘 Week 3 – Post 3: Checkpoints — How Databases Save…

Param Dongre 8 months ago

Error Verifying Master Database Certificate During…

Jackson Andrew 1 month ago

Raj, Kartik are placed at S1.

Isha is placed at S2.

Karan, Kaivalya is placed at S3.

Megha is placed at S0.

Removal/Failure of Server

Now suppose server S1 goes down so what you will do is you will retrieve data placed at S1 and put the data on clockwise next server that is S2 in this case. So now it will look like follows -

So now Raj, Kartik, Isha are present on server S2, data on all other servers are untouched only S1 data is redistributed.

Addition of server

Now suppose in the initial configuration of 4 servers, one more server S4 is added b/w S2 and S3. So now username "kaivalya" will be stored to S4(clockwise next to s2) instead of S3.

So consistent hashing prevents drastic redistribution of data compared to simple hash based approach.

In next article we will take an example and see consistent hashing in depth-

1) How consistent hash ring is formed?

2) How data is stored in consistent hashing?

3) How replication are stored in consistent hashing?

4) What happens when we scale up/scale down?

5) How to distribute data in balanced way?

Until then if you like the article, put a like, repost the article and subscribe my newsletter.

Cheers

Kartik Sapra

#systemdesign #programming #amazon #microsoft #google #softwaredevelopement #softwaredev #faang #maang #techinterview #systemarchitect #systemarchitecture #interviewpreparation #technicalwriter #technicalwriting

System Design For Interviews

7,560 followers

+ Subscribe

Prakhar Rai 3y

Thanks for sharing

Ray Price 3y

🙏 Kartik S. Keep the education coming 😊 Follow #haveitmagical 4daily inspiration ✨️

2 Reactions

Yatin Ahuja 3y

✅ Learnt a new topic today.

1 Reaction

Jayesh Deshmukh 3y

Reach++

1 Reaction

Karan S. 3y

Reach++

1 Reaction

See more comments

To view or add a comment, sign in

System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

Kartik S.

Recommended by LinkedIn

System Design For Interviews

7,560 followers

More articles by Kartik S.

Others also viewed

How to Restore SQL Database from MDF File: Possible Methods Explained

Backup and Restore Encrypted Database in SQL Server: Complete Step-by-Step Guide

What Happens to Your Data When a Database Crashes?

A Comprehensive Guide to Enhancing Server Performance (Part 1)

#Day3 - Storage Engines (Extents)

Master-Slave Databases

SQL Server 2016 Always On Availability Group with Direct Seeding

🔍 Understanding the Key Differences Between: Always On AG, Replication, Mirroring, and Log Shipping in SQL Server 🧠

Need of database connection pool

Database Consolidation -- some thoughts

Explore content categories

Recommended by LinkedIn

System Design For Interviews

7,560 followers

More articles by Kartik S.

System Design Interview - All about Consistent Hashing With Examples

System Design Interviews - CAP Theorem Made Easy

Foundations Of Highly Available System Design - Data Replication And Replication Strategies

Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

Google Summer Of Code Student Application Phase

Others also viewed

How to Restore SQL Database from MDF File: Possible Methods Explained

Backup and Restore Encrypted Database in SQL Server: Complete Step-by-Step Guide

What Happens to Your Data When a Database Crashes?

A Comprehensive Guide to Enhancing Server Performance (Part 1)

#Day3 - Storage Engines (Extents)

Master-Slave Databases

SQL Server 2016 Always On Availability Group with Direct Seeding

🔍 Understanding the Key Differences Between: Always On AG, Replication, Mirroring, and Log Shipping in SQL Server 🧠

Need of database connection pool

Database Consolidation -- some thoughts

Explore content categories