Presentation Contents
Why replicate? (.5m)
Important keywords (3m)
Replication strategies (5m)
Vnodes (.5m)
Hinted handoff (.5m)
Links, more info (.5m)
Why replicate?
Fault tolerance
Application locality (lower latency to end user)
Transacting while data mining on live data without degradation
Important keywords
- Data centers and racks
- Nodes and replicas
- RF: replication factor
- CL: consistency level
- Tokens and the Ring
Racks
Logical grouping of physically related nodes.
Motivated by locality of failures.
Data Centers
Logical grouping of physically related racks.
cassandra-rackdc.properties
# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=dc1
rack=rack1
The Ring!
(stolen from the DataStax course)
- Locating data always requires knowing the full PK
- hash(PK) -> token
- A consistent hashing function is used to calculate the token
CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
CREATE KEYSPACE Excalibur WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
Replication strategies
SimpleStrategy
SimpleStrategy
(stolen from the DataStax course)
- All nodes are peers (no master-replicas)
- RF=2 means data is on 2 nodes
- Client talks to a "Coordinator Node"
Replication strategies
NetworkTopologyStrategy
NetworkTopologyStrategy
(stolen from the DataStax course)
- Network topology communicated via gossip protocol
- Node joining the cluster needs "seed nodes": 1+ node per DC
- Seeds are also used to learn the topology of the ring
No Vnodes: 1 token - 1 node
Vnodes: n tokens - 1 node
- Coordinator node is responsible for storage
- system.hints table holds hinted handoffs
- Write replayed when node comes online
Replication in Cassandra
(10 minute presentation aimed at beginners)
by Mariano Gappa / @MarianoGappa