You are building a social media platform with millions of users, and your MongoDB database is experiencing slow queries and high memory usage. To solve this, you decide to shard the user database using userId
as the shard key.
Step 1: Setup MongoDB Sharded Cluster
A MongoDB sharded cluster requires:
- Config Servers (store metadata).
- Query Router (mongos) (directs queries).
- Multiple Shards (store distributed data).
Cluster Diagram:
Client → mongos → Config Servers
|
--------------------------------------
| | |
+--------+ +--------+ +--------+
| Shard1 | | Shard2 | | Shard3 |
+--------+ +--------+ +--------+
Data A Data B Data C
Step 2: Enable Sharding on the Database
Log into the MongoDB shell and run:
sh.enableSharding("socialMediaDB")
This tells MongoDB that we want sharding for the socialMediaDB
database.
Step 3: Choose a Shard Key & Shard the Collection
Why userId
as the Shard Key?
- It is evenly distributed (random user IDs).
- Queries often use
userId
(making lookups efficient). - It prevents hotspot issues (where a single shard gets overloaded).
Now, we shard the users
collection based on userId
:
sh.shardCollection("socialMediaDB.users", { "userId": "hashed" })
This tells MongoDB to distribute users evenly across shards using hashed sharding.
Step 4: Insert Sample Data
Now that sharding is enabled, let’s insert some users:
db.users.insertMany([
{ "userId": 101, "name": "Alice", "location": "USA" },
{ "userId": 202, "name": "Bob", "location": "UK" },
{ "userId": 303, "name": "Charlie", "location": "India" }
]);
MongoDB automatically distributes users across different shards!
Step 5: Querying Data in a Sharded Cluster
Now, let’s test queries to see how sharding works.
Efficient Query (Targets One Shard )
db.users.find({ userId: 101 })
- Since
userId
is the shard key, mongos directly fetches data from one shard. - Fast and efficient!
Slow Query (Broadcasts to All Shards)
db.users.find({ location: "USA" })
- BAD PRACTICE → This query does not use the shard key, so MongoDB scans all shards (slow!).
- Solution → Use
userId
or store frequently queried data inside the same document.
Step 6: Monitoring Sharding Status
To check how data is distributed across shards, use:
db.users.getShardDistribution()
This shows how many documents exist in each shard.
To list all shards in the cluster:
sh.status()
This provides a detailed report of the cluster, shard key, and data distribution.
Step 7: Adding a New Shard to Scale Further
If more users join, we may need to scale horizontally by adding a new shard:
sh.addShard("mongodb://shard4:27017")
MongoDB automatically redistributes data across all shards!
Real-World Benefits of Sharding
- Faster Queries → Queries using the shard key are routed efficiently.
- Infinite Scaling → More shards can be added as the database grows.
- Better Performance → Read & write operations are distributed across multiple servers.
- High Availability → If one shard fails, the rest of the system keeps running.
Conclusion
Sharding is a powerful feature in MongoDB that helps scale large applications. By choosing the right shard key, optimizing queries, and monitoring distribution, we can achieve high performance and availability.