Implementing Sharding in MongoDB for a Big data

You are building a social media platform with millions of users, and your MongoDB database is experiencing slow queries and high memory usage. To solve this, you decide to shard the user database using userId as the shard key.

Step 1: Setup MongoDB Sharded Cluster

A MongoDB sharded cluster requires:

Config Servers (store metadata).
Query Router (mongos) (directs queries).
Multiple Shards (store distributed data).

Cluster Diagram:

Client → mongos → Config Servers
                 |
  --------------------------------------
  |                  |                  |        
+--------+       +--------+       +--------+     
| Shard1 |       | Shard2 |       | Shard3 |     
+--------+       +--------+       +--------+     
  Data A          Data B          Data C

Step 2: Enable Sharding on the Database

Log into the MongoDB shell and run:

sh.enableSharding("socialMediaDB")

This tells MongoDB that we want sharding for the socialMediaDB database.

Step 3: Choose a Shard Key & Shard the Collection

Why `userId` as the Shard Key?

It is evenly distributed (random user IDs).
Queries often use userId (making lookups efficient).
It prevents hotspot issues (where a single shard gets overloaded).

Now, we shard the users collection based on userId:

sh.shardCollection("socialMediaDB.users", { "userId": "hashed" })

This tells MongoDB to distribute users evenly across shards using hashed sharding.

Step 4: Insert Sample Data

Now that sharding is enabled, let’s insert some users:

db.users.insertMany([
  { "userId": 101, "name": "Alice", "location": "USA" },
  { "userId": 202, "name": "Bob", "location": "UK" },
  { "userId": 303, "name": "Charlie", "location": "India" }
]);

MongoDB automatically distributes users across different shards!

Step 5: Querying Data in a Sharded Cluster

Now, let’s test queries to see how sharding works.

Efficient Query (Targets One Shard )

db.users.find({ userId: 101 })

Since userId is the shard key, mongos directly fetches data from one shard.
Fast and efficient!

Slow Query (Broadcasts to All Shards)

db.users.find({ location: "USA" })

BAD PRACTICE → This query does not use the shard key, so MongoDB scans all shards (slow!).
Solution → Use userId or store frequently queried data inside the same document.

Step 6: Monitoring Sharding Status

To check how data is distributed across shards, use:

db.users.getShardDistribution()

This shows how many documents exist in each shard.

To list all shards in the cluster:

sh.status()

This provides a detailed report of the cluster, shard key, and data distribution.

Step 7: Adding a New Shard to Scale Further

If more users join, we may need to scale horizontally by adding a new shard:

sh.addShard("mongodb://shard4:27017")

MongoDB automatically redistributes data across all shards!

Real-World Benefits of Sharding

Faster Queries → Queries using the shard key are routed efficiently.
Infinite Scaling → More shards can be added as the database grows.
Better Performance → Read & write operations are distributed across multiple servers.
High Availability → If one shard fails, the rest of the system keeps running.

Conclusion

Sharding is a powerful feature in MongoDB that helps scale large applications. By choosing the right shard key, optimizing queries, and monitoring distribution, we can achieve high performance and availability.