Sharding in MongoDB
What could be the possible ways for horizontal partition of the data, logically or physically?
Let's try with the following sample data.
Assumption: We can assume a single row as a document and columns as attributes.
This can be achieved by following three ways.
Zones Sharding: First-way of sharding is based on the department or sharding based on a categorical attribute of the documents. This is very useful when we are interested in categorical retrieval of data. ie we have a country as an essential attribute of documents in a collection and most of the query country name is part of our filter criteria with other attributes.
Ranged Sharding: This kind of sharding is done based on some ranges for salary, or a discrete or continuous value of any attribute in a document. In our example we have a salary (discrete) present at each document, and we assume earning more than 16000 is rich, the query has "rich" or "not rich" as part of our filter criteria with other attributes.
Hashed Sharding. In hash sharding, the system uses some hash function over an attribute and distributes data based on the partition of the hash value. In our example, we are neither interested in a range of the employeeId nor it is a categorical attribute with finite values yet this is an essential part of our filter criteria with other attributes.
Recommended by LinkedIn
Advantages
Increased read/write throughput: Multiple shards improve both read and write operation capacity.
Increased storage capacity: Similarly, by increasing the number of shards, you can also increase overall total storage capacity.
High availability: Since each shard is a replica set, every piece of data is replicated. Since the data is distributed, even if an entire shard becomes unavailable, the database as a whole remains partially functional for reads and writes from the remaining shards.
Disadvantages
Latency: Those queries that have more than one shard involved in retrieving results that get extremely slow.
Sorting issue: As data is indexed and sorted within one shared (system) to gain optimal performance while used local search/sorting, They are not helpful in cross-shard search/sorting queries and result in a slow response or no response.
Inconsistency and non-durability: Due to the more complex failure modes of a set of servers, which often result in systems that do not guarantee cross-shard consistency or durability.
Conclusion :
Point of caution we should evaluate current and upcoming use cases before choosing any sharding, as MongoDB does not provide the luxury of re-sharding. Hope this article gives you a fair understanding of sharding.
Your suggestions/comments are most welcome :).
Good Article Sandeep!!! Very well explained
Spot on , concept explained well with simple example.
Nice Article 🇮🇳 Sandeep Rawat , one more point I would like to add here is, while designing partitioning, data archiving requirements should be considered as well. Otherwise these become a cause of concern for performance.
#mongodb #Sharding