Database Sharding
What is Database Sharding?
Database sharding is the process of breaking a large database into smaller, faster, more manageable pieces called shards. Each shard holds a portion of the data — but together, they represent the entire dataset.
Think of it like splitting a massive phonebook into smaller books — one per city. Each smaller book (shard) is easier to manage, search, and update.
Why Do We Need Sharding?
As data grows, a single database faces 3 main bottlenecks:
Sharding enables horizontal scaling — you add more databases instead of buying a bigger one.
How Sharding Works (Conceptually)
Let’s say you have a users table with 100 million rows.
Instead of storing all users in one DB, you split them into multiple shards based on a shard key (a column that determines which shard stores which data).
Example:
If you choose user_id as the shard key:
ShardRange of User IDsShard 11–10,000,000Shard 210,000,001–20,000,000Shard 320,000,001–30,000,000
Each shard lives on a separate server — but your application knows where to find data for a given user.
Example in Practice
Suppose your system stores data for users by region:
RegionShardAsiaShard AEuropeShard BAmericaShard C
So, when a user from India logs in, your backend queries Shard A directly — no need to touch other shards. This improves speed, latency, and isolation.
Recommended by LinkedIn
Choosing the Right Shard Key
Choosing a good shard key is critical. A bad one can cause data hotspots — where one shard becomes overloaded while others stay idle.
Good Shard Key Characteristics:
Example:
Rebalancing and Challenges
Sharding isn’t magic — it adds complexity.
Common Challenges:
That’s why large-scale systems often use a shard manager — a service that keeps track of which data lives where.
Real-World Example
Key Takeaways
Once data grows beyond a single node, sharding isn’t optional, it’s survival.