Distributed Systems 101

@lvh

_@lvh.io

Introduction

Who am I?

Rackspace

AutoScale

Distributed system
Manages distributed systems
Running on distributed systems
Interacting with distributed systems

Why this talk?

Parallels with Crypto 101

Field considered exclusive domain of experts
Abstinence-only education doesn't work
Lots of material, but not organized for self-teaching
We can't afford not to care

Disconnect

Many theoreticians ignore practice
Many practitioners ignore theory

Distributed Systems 101

"Just enough" distributed systems

to whet your appetite
to shoot yourself in the foot

Distributed systems?

What is a distributed system?

[…] when a machine I've never heard of can cause my program to fail.

– Leslie Lamport

Paradox

Why do we use them? Reliability!
Experts' primary concern? Failure!

Fundamental constraints

Information travels at c Components fail

Fallacies

The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous.

Examples of distributed systems

Basically everything (e.g., your laptop)
- Speed of light isn't infinite
- RAM is all the way over there
Typically:
- Any system with > 1 machine
- Connected via network

Bad news

Theory and consequences

CAP theorem

Pick any two:

Consistency
Availability
Partition tolerance

What does that even mean?

C: linearizability (~ local behavior)
A: all active nodes answer every query
P: failure resistance

Pick any two

Can't sacrifice partition tolerance

Partition tolerance is failure tolerance
Networks, nodes fail all the time
Latency happens; indistinguishable
P(no failures) < 1 - P(one node works)N
- Cascades, Hurst exponent

Pick any two: AP or CP

Example: Zookeeper

Zookeeper is CP
Consistent ops that sometimes fail

Examples: Cassandra

Cassandra is AP
inconsistent ops that (usually) succeed

Informally

Let's look at a 5 node cluster

5-node connected cluster

What happens when stuff fails?

Idea: (N+1)/2 quorum

Some failures

Failures are indistinguishable

Too many failures, no quorum

Simple quorum isn't enough

Back to reality

CAP's C is linearizability
CAP's A is ops on any node
These are very strong guarantees!

Gradations

Reality

Many consistency models (real & theoretical)
Many levels of availability (0-100%)
No reason to sacrifice either outside of failures
Systems can choose what to sacrifice (A vs C)

Example of creative sacrifice: etcd

Normally: consistency all the way
Option of doing inconsistent reads
Maybe get some stale data
… but still works if the cluster is on fire!

What lives in those gray areas?

Trade-offs

Availability Consistency Performance Ease of reasoning Availability Transactionality Scalability Guarantees

High availability can be OK…

Stats, aggregates…

Counting Twitter followers: can be stale
Average e-mail size: OK if you miss a few

Doesn't need to be strongly consistent

Consistency models

One process, one register

Two processes, one register

This is how we expect stuff to work

(We are a spoiled bunch)

Information travels at c

Slow stuff can overlap

Concurrency

Overlapping operations

Consistency models

http://www.bailis.org/papers/hat-vldb2014.pdf http://www.ics.forth.gr/tech-reports/2013/2013.TR439_Survey_on_Consistency_Conditions.pdf http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

Serializability

∃ serial execution with the same result
Some serial execution: fairly weak
- No restrictions on which serial execution
- No restrictions on when they execute

Example: serializability being weak

Precondition: x = 0

x ← 0 x ← 1 x ← 2

Example: serializability being strong

Precondition: x = y = 0

y ← 2, assuming y = 1 x ← 1, assuming x = 0 y ← x, assuming x = 1

Compare and set

Important primitive

Linearizability

All operations appear to happen instantly

Strong serializability

Linearizable & serializable
There is a serial execution …
… and that execution is unique & matches wallclock time (???)

Your computer is distributed

Models also exist in "centralized" systems

SQL databases
Clojure's reference types

Twisted vs threads

Twisted: strongly serializable
- Event loop with 1 reactor thread
- Serializable: reactor finds the ordering
- Linearizable: callbacks run by themselves
Threads: usually more like read uncommitted
- Many Python opcodes are atomic
- Correct use of locks?

Time

What do we do with time?

Global clock model

Global clock

Everyone accesses the same clock instantly
Can compare
Mental mode

Local clock model

Local clocks

Each clock is kinda reliable
Can't compare with other timestamps
Mental model: stopwatch

No clock model

Google Spanner

GPS & atomic clocks

Atomic clocks […] drift significantly due to frequency error.

uncertainty […] generally less than 10ms

Time is a proxy

Actual timestamp vs total order

Vector clocks

Good news

Stuff you can rely on

Queues

Consensus protocols

Examples

ZAB (Zookeeper)
Paxos* (Chubby)
Raft (etcd)

Recipes

On top of consensus protocols:

Set partitioning
Locks
Barriers
…

Set partitioning

{1, 2, 3, 4, 5}

Failures

Recovery

CRDTs

Conflict-free
Replicated
Data
Type

Problem

Read, compute, write back
Concurrency: multiple results
Conflicts!

Solutions?

Last write wins? Some write loses :-(
Coordination? Expensive! :-(

I want highly available data stores…

.. but I don't want nonsense data

Idea!

Describe what you want
Describe conflict resolution

Specializations

The C in CRDT can mean:

Commutative (CmRDT)
Convergent (CvRDT)

Commutative RDTs

Broadcast operations
Merge operation:
- Commutative: f(x, y) = f(y, x)
- Not idempotent f(x, y) != f(f(x, y), y)

Example: integers

+1, -2, +3, +5, -4: +3
Always get same answer:
- As long as I see all ops once
- Duplicate an op, get wrong answer
- Order doesn't matter, though

Convergent RDTs

Broadcast states
Merge operation has many properties:
- Commutative: f(x, y) = f(y, x)
- Idempotent: f(x, y) = f(f(x, y), y)
- Associative: f(f(x, y), z) = f(x, f(y, z))
- Informally: apply lots until done

Simple CvRDT conflict resolution

Complex CvRDT conflict resolution

It's okay if you see writes more than once!

CRDTs in practice: usually CvRDT

Solve a hard local problem once

Solve a hard distributed problem constantly

Examples

Counters (G, PN)
Sets (G, 2P, LWW, PN, OR)
Maps (sets of (k, v) tuples)
Graphs (using multiple sets)
Registers (LWW, MV)
Sequences (continuous, RGA)

Using CRDTs

Designing them is tricky
Using them is fairly easy

Riak <3

Flags, registers, counters, sets, maps

Wrap-up

Yay, distributed systems!

More resilient
More performant
Make problems tractable

Argh, distributed systems!

Incredibly hard to reason about
Huge state space, no repeat scenarios
Expensive to operate

Why distributed systems?

Because you're out of options.

Thank you!

Distributed Systems 101 @lvh _@lvh.io

Distributed Systems 101 – @lvh – _@lvh.io

lvh

Distributed Systems 101 – @lvh – _@lvh.io

1 6 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

DistributedSystems101

Distributed Systems 101

@lvh

_@lvh.io

Introduction

Who am I?

Rackspace

AutoScale

AutoScale

Why this talk?

Disconnect

Distributed Systems 101

Distributed systems?

What is a distributed system?

Paradox

Fundamental constraints

Fallacies

Examples of distributed systems

Bad news

CAP theorem

Pick any two:

What does that even mean?

Pick any two

Pick any two

Can't sacrifice partition tolerance

Pick any two: AP or CP

Example: Zookeeper

Examples: Cassandra

Informally

5-node connected cluster

What happens when stuff fails?

Some failures

Failures are indistinguishable

Too many failures, no quorum

Simple quorum isn't enough

Back to reality

Gradations

Reality

Example of creative sacrifice: etcd

What lives in those gray areas?

Trade-offs

High availability can be OK…

Consistency models

One process, one register

Two processes, one register

This is how we expect stuff to work

Information travels at c

Slow stuff can overlap

Concurrency

Consistency models

Serializability

Example: serializability being weak

Example: serializability being strong

Compare and set

Linearizability

Strong serializability

Your computer is distributed

Twisted vs threads

Time

What do we do with time?

Global clock model

Global clock

Local clock model

Local clocks

No clock model

Google Spanner

Time is a proxy

Vector clocks

Good news

Queues

Consensus protocols

Examples

Recipes

Set partitioning

Failures

Recovery

1 6