Sharding in MongoDB

🇮🇳 Sandeep Rawat

Published Jul 5, 2021

+ Follow

The word "Shard" means "a small part of a whole". Hence Sharding is the process of dividing a larger part into smaller parts. DB shard is a horizontal partition of data in a database. The system keeps them in physically/logical separate hardware to share the load to gain performance and manageability.

What could be the possible ways for horizontal partition of the data, logically or physically?

Let's try with the following sample data.

Assumption: We can assume a single row as a document and columns as attributes.

This can be achieved by following three ways.

Zones Sharding: First-way of sharding is based on the department or sharding based on a categorical attribute of the documents. This is very useful when we are interested in categorical retrieval of data. ie we have a country as an essential attribute of documents in a collection and most of the query country name is part of our filter criteria with other attributes.

Ranged Sharding: This kind of sharding is done based on some ranges for salary, or a discrete or continuous value of any attribute in a document. In our example we have a salary (discrete) present at each document, and we assume earning more than 16000 is rich, the query has "rich" or "not rich" as part of our filter criteria with other attributes.

Hashed Sharding. In hash sharding, the system uses some hash function over an attribute and distributes data based on the partition of the hash value. In our example, we are neither interested in a range of the employeeId nor it is a categorical attribute with finite values yet this is an essential part of our filter criteria with other attributes.

Recommended by LinkedIn

Filter MongoDB Data from a list or an Array of…

Sithum Meegahapola 6 years ago

Why Deleting Data in Postgres Doesn’t Actually Free Up…

vatsal darji 5 months ago

Understanding MongoDB Aggregation Pipeline with an…

Rishav Kumar 1 year ago

Advantages

Increased read/write throughput: Multiple shards improve both read and write operation capacity.

Increased storage capacity: Similarly, by increasing the number of shards, you can also increase overall total storage capacity.

High availability: Since each shard is a replica set, every piece of data is replicated. Since the data is distributed, even if an entire shard becomes unavailable, the database as a whole remains partially functional for reads and writes from the remaining shards.

Disadvantages

Latency: Those queries that have more than one shard involved in retrieving results that get extremely slow.

Sorting issue: As data is indexed and sorted within one shared (system) to gain optimal performance while used local search/sorting, They are not helpful in cross-shard search/sorting queries and result in a slow response or no response.

Inconsistency and non-durability: Due to the more complex failure modes of a set of servers, which often result in systems that do not guarantee cross-shard consistency or durability.

Conclusion :

Point of caution we should evaluate current and upcoming use cases before choosing any sharding, as MongoDB does not provide the luxury of re-sharding. Hope this article gives you a fair understanding of sharding.

Your suggestions/comments are most welcome :).

Gaurav Aggarwal 4y

Good Article Sandeep!!! Very well explained

1 Reaction

Dharmendra Kumar Arya 4y

Spot on , concept explained well with simple example.

1 Reaction

Pankaj Dhingra 4y

Nice Article 🇮🇳 Sandeep Rawat , one more point I would like to add here is, while designing partitioning, data archiving requirements should be considered as well. Otherwise these become a cause of concern for performance.

🇮🇳 Sandeep Rawat 4y

#mongodb #Sharding

See more comments

To view or add a comment, sign in

Sharding in MongoDB

🇮🇳 Sandeep Rawat