Presentation Contents
					Why replicate? (.5m)
						Important keywords (3m)
						Replication strategies (5m)
						Vnodes (.5m)
						Hinted handoff (.5m)
						Links, more info (.5m)
					Why replicate?
					Fault tolerance
						Application locality (lower latency to end user)
						Transacting while data mining on live data without degradation
					Important keywords
						- Data centers and racks
- Nodes and replicas
- RF: replication factor
- CL: consistency level
- Tokens and the Ring
							Racks
						
						
							Logical grouping of physically related nodes.
							Motivated by locality of failures.
						
						
							Data Centers
						
						
							Logical grouping of physically related racks.
						
						
							cassandra-rackdc.properties
							
							# These properties are used with GossipingPropertyFileSnitch and will
							# indicate the rack and dc for this node
							dc=dc1
							rack=rack1
							
						
					
							The Ring!
						
						(stolen from the DataStax course)
					- Locating data always requires knowing the full PK
- hash(PK) -> token
- A consistent hashing function is used to calculate the token
							
							CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
							
							CREATE KEYSPACE Excalibur WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
							
						
					Replication strategies
						SimpleStrategy
					
							SimpleStrategy
						
						(stolen from the DataStax course)
					- All nodes are peers (no master-replicas)
- RF=2 means data is on 2 nodes
- Client talks to a "Coordinator Node"
Replication strategies
						NetworkTopologyStrategy
					
							NetworkTopologyStrategy
						
						(stolen from the DataStax course)
 					- Network topology communicated via gossip protocol
- Node joining the cluster needs "seed nodes": 1+ node per DC
- Seeds are also used to learn the topology of the ring
No Vnodes: 1 token - 1 node
						Vnodes: n tokens - 1 node
						- Coordinator node is responsible for storage
- system.hints table holds hinted handoffs
- Write replayed when node comes online
 
		
					Replication in Cassandra
					(10 minute presentation aimed at beginners)
					
						by Mariano Gappa / @MarianoGappa