Simple math leads to powerful programming



Simple math leads to powerful programming

0 0


monoid-slides

Some slides explaining monoids and semilattices, using reveal.js

On Github rntz / monoid-slides

Simple math leads to powerful programming

Michael Arntzenius

http://www.rntz.net

Motto: Simple concepts in abstract mathematics lead to powerful programming techniques

I will show this by example: monoids.

Explain monoids from the ground up & why they're useful.

What's a monoid, anyway?

  • A binary operator

  • that's associative

  • with an identity element.

Binary operator?

a · b

  • Integer addition, +

  • Set union, ∪

  • String concatenation, ⧺

  • Integer maximum

  • Division, /

Here is a list of binary operators

Associative?

(a · b) · c = a · (b · c)

  • Integer addition, +

  • Set union, ∪

  • String concatenation, ⧺

  • Integer maximum

  • Division, /  (a/b)/c ≠ a/(b/c)

It doesn't matter where you put the parentheses!

Identity element?

a · id = id · a = a

  • Integer addition, +, with 0

  • Set union, ∪, with {}

  • String concatenation, ⧺, with ""

  • Integer maximum   no least integer

  • Division, /  (a/b)/c ≠ a/(b/c)

Monoids:

Associative binary operators with identity.

Monoids are so simple they show up everywhere.

Which I'll demonstrate by giving some more examples.

More monoids

  • Matrix multiplication

  • Merging sorted lists (merge step in mergesort!)

  • Unix commands under |, with cat

  • Composing functions of type (A → A), with (λx.x)

  • Regular expressions under choice ("|" in (a|b))

  • CFGs under choice ("|" in A ::= B | C)

What are they good for?

Functions for free

Parallelization / MapReduce

Starting point for exploring algebra

Functions for free

Write this:
instance Monoid String where
    mappend = (++)       -- ++ is the operator
    mempty = ""          -- "" is the identity
Get this for free!
mconcat ["monoids", " are ", "great"]
-- Evaluates to "monoids are great"

Idea: any monoid gives you an n-ary operator for free. Associative: apply to N things w/o worry parentheses If N=0, return the identity.

This is exactly what mconcat does.

mconcat can be parallelized

  • Split the list in half;

  • send each half to a different processor;

  • recurse!

  • Once both halves return, mappend their results together.

(Do whiteboard diagram.)

Theoretical speedup: O(n) to O(log n).

MapReduce

The reduce step is a distributed, parallel mconcat

MapReduce is a framework (originally from Google) for doing parallel computation on massive datasets.

The "map" part is a parallel map.

Parallelizing mconcat is the core of the "reduce" part of MapReduce.

It's not quite that simple (MapReduce makes assumptions about monoid, sorts for you.)

Monoids:

  • Associative binary operators with identity

  • Turn up everywhere

  • Make the "reduce" in map-reduce go

Semilattices

What's a semilattice, anyway?

  • A binary operator

  • that's associative

  • and commutative

  • and idempotent

  • with an identity element.

Semilattice is overloaded terminology.

Technically: join-semilattices with least elements; I'll just say "semilattice".

What's a semilattice, anyway?

  • A monoid

  • that's commutative

  • and idempotent.

Some monoids

  • Integer addition, +, with 0

  • Set union, ∪, with {}

  • String concatenation, ⧺, with ""

  • Integer maximum   no least integer

  • Division, /  (a/b)/c ≠ a/(b/c)

Commutative?

a · b = b · a

  • Integer addition, +, with 0

  • Set union, ∪, with {}

  • String concatenation, ⧺, with ""

  • Integer maximum   no least integer

  • Division, /  (a/b)/c ≠ a/(b/c)

Idempotent?

a · a = a

  • Integer addition, +, with 0   1 + 1 = 2

  • Set union, ∪, with {}

  • String concatenation, ⧺, with ""

  • Integer maximum   no least integer

  • Division, /  (a/b)/c ≠ a/(b/c)

More semilattices

  • Natural numbers under maximum

  • Booleans under logical "or"

  • N-bit integers under bitwise "or"

  • Bloom filters under union

What are they good for?

Functions for free

Distributed programming (where they are called CvRDTs)

CvRDT?

Convergent replicated datatype

Used to implement eventually-consistent systems

Core idea: Allow changes in any order, possibly multiple times, yet guarantee the same result

That's all folks