MongoDB : Sharding


Mongodb sharding is based on shard key.

K1 -> k2 on shard1
K2 -> k3 on shard2 etc..

Each shard is then replicated for higher availability and DR etc..Sharding is therefore range based. Sharding is done per collections basis.Range based sharding helps it do range based queries.

All of the documents on a particular shard are known as chunks ~~ 100mb.

There are two operations that happen in the background in sharding and these are done automatically for us.

  1. ┬áSplit – splits a range if the range is producing bigger chunks, this is fairly expensive
  2. Migrate – moves chunks to somewhere else in the cluster, this is somewhat expensive.
    • Between a pair of shards there will not be more than one migration activity.
    • We can still read and write from the data when we are migrating. So, it is live.
    • “Balancer” decides when to do the balancing. It balances on the number of chunks today.

Both of these are done to maintain a balance in the shards w.r.t. the documents.
The metadata about these shards and our system is stored in config servers. These are light weight.
Conceptually these shards are processes and not separate physical machines or virtual machines although they can and most likely will be.

Mongos gives the client the big picture of the whole setup. Client is therefore insulated from the underlying architecture that is used to implement sharding, replication etc..


End client applications should go through mongos.

To create a shard connect to mongos


db.ensureIndex(Key pattern for your shard key)

Choosing a shard key

– Filed should be involved in most of the queries
– Good cardinality/granularity
– Shard key should not increase monotonically

MongoDB – Replication


Replication helps us achieve availability and fault-tolerance. A replica set is a set of mongo nodes that replicate data amongst each other asynchronously. One of the replica sets is primary while the rest of them will be secondary.
Writes only happen to the primary. If the primary goes down then an election happens and the new primary comes up.
Minimum number of nodes will be 3, since the election requires a majority of the original set.
If there were only 2 sets then the remaining one is not a majority and you would not be able to write.

Replica Set Elections

Regular Node : It has the data and can become primary or secondary.
Arbiter : It is just there for voting purposes. We need it if we want an even number of nodes.
It has no data on it.
Delayed/Regular : It can be set to a few hours after the nodes. It cannot become primary. It’s priority is set to 0.
Hidden Node : It cannot become a primary node. It’s priority is set to 0.

By default the reads and writes go to the primary. You can go to secondary for reading. This means that you might read stale data. The lag between nodes is not guaranteed since the process is async. If you read from secondary then what we have is “eventual consistency”.

rs.slaveOk() -- ok to read from the secondary
rs.isMaster() -- checks whether the node is master
rs.status() -- gives the current status of the replica set.

In the database locals , the collection has all the operational logs. Replication happens when secondary nodes query the primary for the changes from a given timestamp. OpLog is the statement based replication log.
Replication happens in a statement driven manner?
e.g If a statement deletes 100 documents on the primary then there will 100 statements that are sent to the secondary to execute. There is no binary replication. This allows us to run different version of mongodb on different machines.

  • Try to keep the OpLog small on 64 bit machine since it defaults to a large value on 64 bit systems.
  • For replica sets don’t use localhost or the ip address of the machine.
  • Use a logical name, that is the best practice.
  • Use DNS.
  • Pick appropriate TTL as well.