Distributed Systems 101
@lvh
AutoScale
- Distributed system
- Manages distributed systems
- Running on distributed systems
- Interacting with distributed systems
Why this talk?
Parallels with Crypto 101
- Field considered exclusive domain of experts
- Abstinence-only education doesn't work
- Lots of material, but not organized for self-teaching
- We can't afford not to care
Disconnect
- Many theoreticians ignore practice
- Many practitioners ignore theory
Distributed Systems 101
"Just enough" distributed systems
- to whet your appetite
- to shoot yourself in the foot
What is a distributed system?
[…] when a machine I've never heard of can cause my program to
fail.
– Leslie Lamport
Paradox
- Why do we use them? Reliability!
- Experts' primary concern? Failure!
Fundamental constraints
Information travels at c
Components fail
Fallacies
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Examples of distributed systems
- Basically everything (e.g., your laptop)
- Speed of light isn't infinite
- RAM is all the way over there
- Typically:
- Any system with > 1 machine
- Connected via network
Bad news
Theory and consequences
Pick any two:
- Consistency
- Availability
- Partition tolerance
What does that even mean?
- C: linearizability (~ local behavior)
- A: all active nodes answer every query
- P: failure resistance
Can't sacrifice partition tolerance
- Partition tolerance is failure tolerance
- Networks, nodes fail all the time
- Latency happens; indistinguishable
- P(no failures) < 1 - P(one node works)N
Example: Zookeeper
- Zookeeper is CP
- Consistent ops that sometimes fail
Examples: Cassandra
- Cassandra is AP
- inconsistent ops that (usually) succeed
Informally
Let's look at a 5 node cluster
What happens when stuff fails?
Idea: (N+1)/2 quorum
Failures are indistinguishable
Too many failures, no quorum
Simple quorum isn't enough
Back to reality
- CAP's C is linearizability
- CAP's A is ops on any node
- These are very strong guarantees!
Reality
-
Many consistency models (real & theoretical)
-
Many levels of availability (0-100%)
- No reason to sacrifice either outside of failures
- Systems can choose what to sacrifice (A vs C)
Example of creative sacrifice: etcd
- Normally: consistency all the way
- Option of doing inconsistent reads
- Maybe get some stale data
- … but still works if the cluster is on fire!
What lives in those gray areas?
Trade-offs
Availability
Consistency
Performance
Ease of reasoning
Availability
Transactionality
Scalability
Guarantees
High availability can be OK…
Stats, aggregates…
- Counting Twitter followers: can be stale
- Average e-mail size: OK if you miss a few
Doesn't need to be strongly consistent
One process, one register
Two processes, one register
This is how we expect stuff to work
(We are a spoiled bunch)
Concurrency
Overlapping operations
Serializability
- ∃ serial execution with the same result
-
Some serial execution: fairly weak
- No restrictions on which serial execution
- No restrictions on when they execute
Example: serializability being weak
Precondition: x = 0
x ← 0
x ← 1
x ← 2
Example: serializability being strong
Precondition: x = y = 0
y ← 2, assuming y = 1
x ← 1, assuming x = 0
y ← x, assuming x = 1
Compare and set
Important primitive
Linearizability
All operations appear to happen instantly
Strong serializability
- Linearizable & serializable
- There is a serial execution …
- … and that execution is unique & matches wallclock time (???)
Your computer is distributed
Models also exist in "centralized" systems
- SQL databases
- Clojure's reference types
Twisted vs threads
- Twisted: strongly serializable
- Event loop with 1 reactor thread
- Serializable: reactor finds the ordering
- Linearizable: callbacks run by themselves
- Threads: usually more like read uncommitted
- Many Python opcodes are atomic
- Correct use of locks?
Global clock
- Everyone accesses the same clock instantly
- Can compare
- Mental mode
Local clocks
- Each clock is kinda reliable
- Can't compare with other timestamps
- Mental model: stopwatch
Google Spanner
GPS & atomic clocks
Atomic clocks […] drift significantly due to frequency
error.
uncertainty […] generally less than 10ms
Time is a proxy
Actual timestamp vs total order
Good news
Stuff you can rely on
Examples
- ZAB (Zookeeper)
- Paxos* (Chubby)
- Raft (etcd)
Recipes
On top of consensus protocols:
- Set partitioning
- Locks
- Barriers
- …
Set partitioning
{1, 2, 3, 4, 5}
CRDTs
- Conflict-free
- Replicated
- Data
- Type
Problem
- Read, compute, write back
- Concurrency: multiple results
- Conflicts!
Solutions?
- Last write wins? Some write loses :-(
- Coordination? Expensive! :-(
I want highly available data stores…
.. but I don't want nonsense data
Idea!
- Describe what you want
- Describe conflict resolution
Specializations
The C in CRDT can mean:
- Commutative (CmRDT)
- Convergent (CvRDT)
Commutative RDTs
- Broadcast operations
- Merge operation:
- Commutative: f(x, y) = f(y, x)
- Not idempotent f(x, y) != f(f(x, y), y)
Example: integers
- +1, -2, +3, +5, -4: +3
- Always get same answer:
- As long as I see all ops once
- Duplicate an op, get wrong answer
- Order doesn't matter, though
Convergent RDTs
- Broadcast states
- Merge operation has many properties:
- Commutative: f(x, y) = f(y, x)
- Idempotent: f(x, y) = f(f(x, y), y)
- Associative: f(f(x, y), z) = f(x, f(y, z))
- Informally: apply lots until done
Simple CvRDT conflict resolution
Complex CvRDT conflict resolution
It's okay if you see writes more than once!
CRDTs in practice: usually CvRDT
Solve a hard local problem once
vs
Solve a hard distributed problem constantly
Examples
- Counters (G, PN)
- Sets (G, 2P, LWW, PN, OR)
- Maps (sets of (k, v) tuples)
- Graphs (using multiple sets)
- Registers (LWW, MV)
- Sequences (continuous, RGA)
Using CRDTs
- Designing them is tricky
- Using them is fairly easy
Riak <3
Flags, registers, counters, sets, maps
Yay, distributed systems!
- More resilient
- More performant
- Make problems tractable
Argh, distributed systems!
- Incredibly hard to reason about
- Huge state space, no repeat scenarios
- Expensive to operate
Why distributed systems?
Because you're out of options.
0
Distributed Systems 101
@lvh
_@lvh.io