Replication in Cassandra – Important keywords – Replication strategies



Replication in Cassandra – Important keywords – Replication strategies

0 0


cassandra-replication-slides

A 10 minute presentation on Replication in Cassandra, aimed at beginners

On Github MarianoGappa / cassandra-replication-slides

Replication in Cassandra

(10 minute presentation aimed at beginners)

by Mariano Gappa / @MarianoGappa

Presentation Contents

Why replicate? (.5m) Important keywords (3m) Replication strategies (5m) Vnodes (.5m) Hinted handoff (.5m) Links, more info (.5m)

Why replicate?

Fault tolerance Application locality (lower latency to end user) Transacting while data mining on live data without degradation

Important keywords

  • Data centers and racks
  • Nodes and replicas
  • RF: replication factor
  • CL: consistency level
  • Tokens and the Ring

Racks

Logical grouping of physically related nodes. Motivated by locality of failures.

Data Centers

Logical grouping of physically related racks.

cassandra-rackdc.properties # These properties are used with GossipingPropertyFileSnitch and will # indicate the rack and dc for this node dc=dc1 rack=rack1

The Ring!

(stolen from the DataStax course)
  • Locating data always requires knowing the full PK
  • hash(PK) -> token
  • A consistent hashing function is used to calculate the token

CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; CREATE KEYSPACE Excalibur WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

Replication strategies

SimpleStrategy

SimpleStrategy

(stolen from the DataStax course)
  • All nodes are peers (no master-replicas)
  • RF=2 means data is on 2 nodes
  • Client talks to a "Coordinator Node"

Replication strategies

NetworkTopologyStrategy

NetworkTopologyStrategy

(stolen from the DataStax course)
  • Network topology communicated via gossip protocol
  • Node joining the cluster needs "seed nodes": 1+ node per DC
  • Seeds are also used to learn the topology of the ring

Vnodes

No Vnodes: 1 token - 1 node

Vnodes: n tokens - 1 node

Hinted handoff

  • Coordinator node is responsible for storage
  • system.hints table holds hinted handoffs
  • Write replayed when node comes online

Links, more info

Thank you! Questions?

- Suggest edits! - Fork my slides - Twitter

Replication in Cassandra (10 minute presentation aimed at beginners) by Mariano Gappa / @MarianoGappa