What's MongoDB?
MongoDB [hu·mon·gous] is a non-relational document-oriented
data storage system with integrated DBMS that provides high perfomance, scalabilty and flexibility.
Architecture
MongoDB runs on kernel written in C++ and Google V8 JavaScript Engine, that parse code directly to
processor instructions.
All data are stored in BSON format, a binary version JSON.
Document-oriented
Basic storage unit is document (SQL row) that is
essentially an association array stored in special format BSON.
Documents are organised in collections (SQL table) that
have no relations between each other.
Replication
- Redundancy of data
- Failover mechanisms
- Multi-DC deployment
- Zero down-time upgrades and maintenance
Main features
- Asynchronous
- Single-Primary (NOT Master-Master)
- Statement-Based AND Binary-Based
- Majority election
- Automatic fail-over mechanisms
- Automatic node recovery mechanisms
Architecture of replication
Redirecting the clients to other nodes
Syncing the data and failover
Mongod
Primary daemon process for whole MongoDB system
- handles data requests
- manages data format
- performs background management operations
Amazing options
--replSet <setname>
Configures replication
--oplogSize <value>
Specifies a maximum size in megabytes for the operation log
By default 5% of disk size. Specifies the delay of instances syncing
Elections
Determine which member will become primary based on:
priority
number of votes
heartbeat
amount of connections to other members
optime
Election Triggering Events
- Initiation of a new replica set
- Secondaries loses contact with a primary
- New secondary appears that has higher priority
- Primary can not contact majority of members in replica set
Replica set members
Synchronised instances possessing the same data sets
Primary Member
The only member in the replica set that receives write operations
Secondary Member
Replicate the primary’s oplog and apply the operations to their data sets
0 Priority Member
Can't become the primary in an election
Hidden Member
DBMS doesn't allow read from this member
Delayed Member
Has time-shifted oplog to be used to recovery snapshots from certain errors
Arbiter
- Doesn't store any data
- Can't become the primary member
- Resolves the problem of tied elections
Sharding
- Partitioning the data
- Auto-balancing
- Scale write throughput
- Optimizing storage capacity
Sharded cluster components
- mongos - router communicating with shards
- config server - stores configuration, chunks list, locks, etc.
- shard - a single mongod instance or replica set
Shard key
- Determines placement of a document in specific shard
- Utilize one or many fields
- Uses mongo indexes
- Two kinks of shard keys:
-
Considerations when selecting shard key
- When using range based shard keys:
-
- could result in uneven distribution (don't use ObjectId values or timestamps)
- easy to determine shards on range queries
- When using hash based shard keys:
-
- good to use ObjectId values or timestamps
- range queries would me more likley query every shard
Splitting example
Range based sharding
numInitialChunks equals to 1
Creating new chunk
chunksize equals to 1 [MB]
Balancing example
migration threshold = 2