An Infinispan Tale – How split clusters can get back together happily



An Infinispan Tale – How split clusters can get back together happily

0 1


infinispan-presentation-splitbrain

Infinispan Split-Brain Presentation

On Github tristantarrant / infinispan-presentation-splitbrain

An Infinispan Tale

How split clusters can get back together happily

Tristan Tarrant / @tristantarrant

About me

Infinispan Project Lead

Open Source hacker since 1993

@ Red Hat since 2011

Agenda

  • What is Infinispan ?
  • How does Infinispan approach clustering ?
  • How does Infinispan handle cluster partitions ?

What is Infinispan ?

  • In-Memory Java Key/Value store (with Scala bits)
  • Local + Clustered
  • Elastic + Highly scalable
  • Transactional + Strongly consistent
  • Distributed under the ASL 2.0
  • Supported product: Red Hat JBoss Data Grid

What can Infinispan do ?

  • Optional persistence:File + LevelDB + JDBC + JPA + REST
  • Embedded + Client/Server
  • Remote Protocols: HotRod + REST + Memcached + WebSocket
  • Distributed execution framework
  • Indexed and Indexless queries
  • Events: local + clustered + remote
  • Security: ACL, Encryption, Audit

Does Infinispan scale ?

Yes.

Largest cluster in production:

320 nodes, 3000 caches, 20TB RAM

Largest cluster formed:

1000 nodes

Infinispan clustering

Transport

  • Based on JGroups
  • Multiprotocol: UDP + TCP
  • Discovery: Multicast + Unicast + S3 + Google + etc
  • Security
  • Geographical replication

Infinispan clustering

Cluster Topology

  • JGroup views
  • Represents group membership
  • Each node has a unique address
  • Ordered, first is coordinator
  • View changes when nodes join + leave
  • View Id

Infinispan clustering

Consistent Hashing

  • Key hashed using MurmurHash3 algorithm
  • Hash space divided into segments
  • Segments assigned to nodes
  • Key > Segment > Owners
  • Primary and Backup owners

Consistent Hashing Visualization

Step 1: Empty cluster

Consistent Hashing Visualization

Step 2: Adding one entry

Consistent Hashing Visualization

Step 3: Primary and Backup

Consistent Hashing Visualization

Step 4: Another one

Consistent Hashing Visualization

Step 5: You get the idea

Consistent Hashing Visualization

Step 6: Here's one I made earlier

Infinispan clustering

Controlling Consistent Hashing

  • Physical topology aware (site/rack/machine)
  • Grouping
  • Key Affinity service
  • Capacity Factor

Infinispan clustering

State Transfer

  • Elasticity
  • Nodes added / removed
  • We still have at least one owner
  • Rebalancing: moving segments to satisfy distribution

Rebalancing Visualization

Step 1: My cluster full of data

Rebalancing Visualization

Step 2: He's dead, Jim

Rebalancing Visualization

Step 3: A healed cluster

Disaster:

What if multiple nodes fail at once ?

CAP: The Theorem

  • Formulated by Eric Brewer in 1998
  • Consistency
  • Availability
  • Partitioning
  • Can only satisfy two properties at once

CAP: The Combinations

  • Consistency + Availability: The "Ideal World"
  • Partitioning + Availability: "I bend so I don't break"
  • Partitioning + Consistency: "Don't corrupt my data"

Partition handling strategies

Prefer availability

<distributed-cache>
    <partition-handling enabled="false"/>
</distributed-cache>
ConfigurationBuilder dcc =new ConfigurationBuilder();
dcc.clustering().partitionHandling().enabled(false);

Prefer consistency

<distributed-cache>
    <partition-handling enabled="true"/>
</distributed-cache>
ConfigurationBuilder dcc =new ConfigurationBuilder();
dcc.clustering().partitionHandling().enabled(true);

Split detection

  • Lost >= number of owners
  • Ensure a stable topology
  • Check segment ownership
  • Mark partition as available / degraded / unavailable
  • Send PartitionStatusChangedEvent to listeners

Cluster partitioning

Case 1: No data loss

Cluster partitioning

Case 2: Lost data

Merging split clusters

  • Split clusters see each other again
  • Ensure a stable topology
  • Automatic: new state based on partition state
    • One unavailable partition > UNAVAILABLE
    • One available partition > attempt merge
    • All partitions are degraded > attempt merge
  • Manual
    • Data was lost
    • Custom listener on Merge event
    • It's up to YOU

Future work

  • Automatic state reconciliation
  • Eventual consistency !!!

Thanks / Q&A

http://infinispan.org

@infinispan + @tristantarrant

https://github.com/infinispan

https://github.com/tristantarrant/infinispan-presentation-splitbrain