MongoDB – Sharding and replication – Replication



MongoDB – Sharding and replication – Replication

0 0


Presentation-MongoDB--vol2

Introduction to Replication & Sharding

On Github IjinPL / Presentation-MongoDB--vol2

MongoDB

Sharding and replication

Created by Błażej Krysiak / Patryk Orwat / Piotr Naumczyk / Michał Rynduch / Tomek Dziopa

What's MongoDB?

MongoDB [hu·mon·gous] is a non-relational document-oriented data storage system with integrated DBMS that provides high perfomance, scalabilty and flexibility.

Architecture

MongoDB runs on kernel written in C++ and Google V8 JavaScript Engine, that parse code directly to processor instructions.

All data are stored in BSON format, a binary version JSON.

Document-oriented

Basic storage unit is document (SQL row) that is essentially an association array stored in special format BSON.

Documents are organised in collections (SQL table) that have no relations between each other.

Replication

  • Redundancy of data
  • Failover mechanisms
  • Multi-DC deployment
  • Zero down-time upgrades and maintenance

Main features

  • Asynchronous
  • Single-Primary (NOT Master-Master)
  • Statement-Based AND Binary-Based
  • Majority election
  • Automatic fail-over mechanisms
  • Automatic node recovery mechanisms

Architecture of replication

Working replica set

One srv is going down!

Redirecting the clients to other nodes

Syncing the data and failover

Returning back to work

Mongod

Primary daemon process for whole MongoDB system

  • handles data requests
  • manages data format
  • performs background management operations

Amazing options

--replSet <setname>

Configures replication

--oplogSize <value>

Specifies a maximum size in megabytes for the operation log

By default 5% of disk size. Specifies the delay of instances syncing

Elections

Determine which member will become primary based on:

priority number of votes heartbeat amount of connections to other members optime

Election Triggering Events

  • Initiation of a new replica set
  • Secondaries loses contact with a primary
  • New secondary appears that has higher priority
  • Primary can not contact majority of members in replica set

Simple election scenario

Replica set members

Synchronised instances possessing the same data sets

Primary Member

The only member in the replica set that receives write operations

Secondary Member

Replicate the primary’s oplog and apply the operations to their data sets

0 Priority Member

Can't become the primary in an election

Hidden Member

DBMS doesn't allow read from this member

Delayed Member

Has time-shifted oplog to be used to recovery snapshots from certain errors

Arbiter

  • Doesn't store any data
  • Can't become the primary member
  • Resolves the problem of tied elections

Sharding

  • Partitioning the data
  • Auto-balancing
  • Scale write throughput
  • Optimizing storage capacity

Genegal example

Sharded cluster components

  • mongos - router communicating with shards
  • config server - stores configuration, chunks list, locks, etc.
  • shard - a single mongod instance or replica set

Sharding architecture

Shard key

  • Determines placement of a document in specific shard
  • Utilize one or many fields
  • Uses mongo indexes
  • Two kinks of shard keys:
    • range based
    • hash based

Considerations when selecting shard key

  • When using range based shard keys:
    • could result in uneven distribution (don't use ObjectId values or timestamps)
    • easy to determine shards on range queries
  • When using hash based shard keys:
    • good to use ObjectId values or timestamps
    • range queries would me more likley query every shard

Splitting example

Range based sharding

numInitialChunks equals to 1

Creating new chunk

chunksize equals to 1 [MB]

Creating new chunk

Balancing example

migration threshold = 2

After balancing process

Thanks you for your attention

You can find the sides on my blog:

www.codefedonarts.com