HaskellPure & Simple – Purpose of this Talk – Possible side effects



HaskellPure & Simple – Purpose of this Talk – Possible side effects

1 4


talk-haskell-pure-and-simple

Slides for a talk that walks through the haskell-hubbub codebase (https://github.com/benkolera/haskell-hubbub)

On Github benkolera / talk-haskell-pure-and-simple

HaskellPure & Simple

Brisbane Functional Programming Group - 2014-01-28

Purpose of this Talk

Show off some cool haskell code! :)

Show that haskell can do lots of super useful things.

Even in something that has lots of side effects.

And that a degree in Category Theory is not required.

Possible side effects

Feel comfortable with the basics of the haskell toolbox.

Less afraid of haskell than what you are now.

Have a glimpse of the beauty & merit of the haskell way.

Inspired to experiment and learn!

Whirlwind Tour of

  • Haskell Basics
  • Scotty
  • Tasty & HUnit
  • SqLite Simple
  • Aeson
  • Cabal Projects
  • Http Conduit
  • STM & TQueues
  • Acid State
  • Websockets

Talk Rules

We've got the rest of the meetup to talk. Plenty of time!

Ask lots of questions if something isn't clear.

Code is available on github so you can go through it later.

We'll stop for a break whenever the pizza arrives.

Haskell Basics

Haskell Modules

module FizzBuzz (fizzbuzz) where

import Control.Monad (mapM_)

data Result = Number Integer | Fizz | Buzz | FizzBuzz deriving (Show,Eq)

fizzbuzz :: IO ()
fizzbuzz = mapM_ (print . result) [1..100]

result :: Integer -> Result
result n | n `mod` 15 == 0 = FizzBuzz
         | n `mod` 5  == 0 = Fizz
         | n `mod` 3  == 0 = Buzz
         | otherwise       = Number n

Imports and Exports

Can be explicit or implict with both:

module FizzBuzz (fizzbuzz) where
import Control.Monad (mapM_)
vs
module FizzBuzz where
import Control.Monad

I prefer being explicit, so my code is less terse than some haskell code.

Datatypes & Constructors

Constructors aren't types themselves.

data Result = Number Integer | Fizz | Buzz | FizzBuzz deriving (Show,Eq)

Not possible:

printBuzz :: Buzz -> String

Can take a Result and match each constructor:

printResult :: Result -> String
printResult Fizz       = "Fizz"
printResult Buzz       = "Buzz"
printResult FizzBuzz   = "FizzBuzz"
printResult (Number n) = show n

Special Type Declarations

Alias (can be used interchangibly)

type FilePath = String

Newtype (cannot be used interchangibly)

newtype FilePath = FilePath String

Records (generates getters and setters

data FilePath = FilePath { dir::String , filename::String } 

Typeclasses

Polymorphism over disparate types.

print :: Show a => IO ()
print a = putStrLn (show a)

And to define one for Result:

instance Show Result where
  show Fizz       = "Fizz"
  show Buzz       = "Buzz"
  show FizzBuzz   = "FizzBuzz"
  show (Number n) = show n

Function Application

ghci> :t map 
map :: (a -> b) -> [a] -> [b]

Is done with a space. Returns rest of the function:

ghci> :t map (+1)
map (+1) :: Num b => [b] -> [b]

Which can then be applied to get the result:

ghci> map (+1) [1,2,3]
[2,3,4]

Functions & Precedence

Function application has highest precedence (10).

Functions that don't start with a letter are infix.

Can make a function infix with bacticks (5 `div` 2).

Infix operators each have a precedence level.

Composition (.) has 9 and ($) has 0.

Safest way is to use parens where you feel like it.

Hlint will tell you which parens are unnecessary.

Lots of ways to make functions!

As top level functions / in let or where blocks.

As Lambdas / anonymous functions:

ghci> :t (\a -> a + 1 )
(\a -> a + 1 ) :: Num a => a -> a

As sections (better style than lambdas):

ghci> :t (+ 1)
(+ 1) :: Num a => a -> a

Combining Functions

Composition:

ghci> :t show . (+1)
show . (+1) :: (Num b, Show b) => b -> String
ghci> show . (+1) $ (5::Int)
"6"

Applicative:

ghci> import Control.Applicative
ghci> import Data.Char
ghci> :t liftA2 (||) isDigit isAlpha
liftA2 (||) isDigit isAlpha :: Char -> Bool
ghci> filter (liftA2 (||) isDigit isAlpha) "abcd34efg!.,'  "
"abcd34efg"

And taking this further:

import Data.Arrow 

ghci> :t (***)
(***) :: Arrow a => a b c -> a b' c' -> a (b, b') (c, c')

ghci> :t (++"bar") *** (+9000)
(++"bar") *** (+9000) :: Num a => ([Char], a) -> ([Char], a)

ghci> (++"bar") *** (+9000) $ ("foo",0)
("foobar",9000)

ghci> :t join (***) (+1)
join (***) (+1) :: Num a => (a, a) -> (a, a)

ghci> join (***) (+1) $ (1,2)
(2,3)

(This is as weird as it gets, I promise!)

Whitespace - Offside rule

Lines that start at the same level of indentation are different expressions.

To continue the same expression on a new line, indent at least one space:

do
  x <- getXFromDatabase
  y <- getYFromDatabase
  return $ thisFunctionHasAReallyReallyLongNameSoINeedToPutTheParamsOnTheNextLine
    x
    y

IO

ghci> :t putStrLn
putStrLn :: String -> IO ()

ghci> :t getLine
getLine :: IO String

ghci> :t map putStrLn ["foo","bar"] --List of IO that are never run!
map putStrLn ["foo","bar"] :: [IO ()]

Are actions that are run by the RTS only if the action is sequenced to the main function.

IO actions are values. Can do some useful things with them.

There is (almost) no way to get a value out of IO. Can only sequence with other IO.

IO - Example

main = do 
  fromHandle <- getAndOpenFile "in.txt" ReadMode
  toHandle   <- getAndOpenFile "ALLCAPS.txt" WriteMode 
  contents   <- hGetContents fromHandle
  hPutStr toHandle $ map toUpper contents
  hClose toHandle
  putStr "DONE!!!"

Hint: Separating the execution of the IO action from the program code that generates it turns out to be super duper useful!

Monads

Lots and lots of things require some kind of sequencing bits of stuff.

IO is one such thing, but there are lots of others and we'll see few.

Do notation and monad combinators is just a DSL for sequencing and manipulating these bits.

Not a burrito.

Strings vs Text vs ByteString

String is a list of Characters (Obsolete)

Text is a lazy or strict unicode data structure. Fast and correct.

ByteString is lazy or strict binary data structure. Must choose a decoding method yourself (ASCII,utf8,etc) to get to String or Text.

Overloaded strings extension means that string literals can either be String,Text or ByteString.

Going from lazy to strict involves reading everything into memory.

PubSubHubbub

A simple, open, server-to-server webhook-based pubsub (publish/subscribe) protocol for any web accessible resources.

Parties (servers) speaking the PubSubHubbub protocol can get near-instant notifications (via webhook callbacks) when a topic (resource URL) they're interested in is updated.

Protocol

Subscriber program requests a atom/rss resource from the publisher. Publisher includes URL of hub that the subscriber can use to subscribe. Subscriber registers a callback URL with the hub. Upon updating the resource, the publisher lets the hub know and the hub distributes the new feed to subscribers by posting the content to each callback URL.

Subscription: Finer Deets

Subscripion takes an optional secret.

Subscriptions must be 202 accepted and then verified with the subscriber.

Verification involves calling GET on the callback url with the topic, and a challenge. Subscriber must return a 200 OK with the challenge in the body.

Subscription also optionally includes a secret string shared to the hub. The hub must hmac sign the http body of any content distribution if it was given.

Why is this useful?

Low latency and publisher friendly feed consumers (read: news readers)!

At work, our internal services publish event streams via atom feeds.

Nice solution as PubSubHubbub allows consumers to be push or pull based as required.

Demo

Pieces of the Puzzle

Publisher.hs: Scotty web server that allows post creation from a browser.

Subscriber.hs: Scotty web server that subscribes to hub, listen for callbacks and pushes the updated post list to the browser via a websocket.

Main.hs: The actual PubSubHubbub scotty web server.

That sounds like a lot of state!

Lots of asynchronicity and state needed. Very few pure bits.

Can haskell even do this in a nice way?!?

Haskell-hubbub was my experimentation to help answer this.

TL;DR: The answer is 'definitely yes'. :)

Cabal

Cabal is a build tool

Use it for any non-throwaway program.

Akin to sbt/lein/mvn/gradle/gem/pip/etc.

Package can product one or more libraries and/or binaries.

Can also have testsuites and benchmarks that aren't compiled into the final artifacts.

Pulls dependencies from hackage.haskell.org.

Builds c code, links external libraries

And lots more!

haskell-hubbub Build Configuration

Library code in src/, tests in test/ and binary entry points at top level. (Main.hs,Publisher.hs,Subscriber.hs)

Three different build options:

  • Hubbub Library and hubbub binary.
  • Library plus hubbub and mock binaries.
  • Library and test cases only

Build steps

Initalize sandbox and install every dependency into the sandbox:

cabal sandbox init
cabal install --enable-tests --flag=mocks --only-dependencies

Configure and run tests:

cabal configure --enable-tests --flag=tests #Only needs to be done once.
cabal test

Build everything including mock binaries:

cabal configure --flag=mock #Only needs to be done once.
cabal build
dist/build/hubbub/hubbub 
dist/build/hubbub-mock-publisher/hubbub-mock-publisher
dist/build/hubbub-mock-subscriber/hubbub-mock-subscriber

Note! This is just my Setup

Cabal is extraordinarily flexible. Can set it up however you like.

This just works best for me. Other packages set things up differently.

Helps to see this as it is quite confusing at first.

Publisher.hs

  • Scotty
  • Acid State
  • Http-Conduit
  • Blaze-Html

Acid State Code Dive

Declare the datatypes with standard haskell.

Use safecopy for versioned serialisation.

Make transactions in Query and Update contexts.

Can build bigger Queries/Updates out of others.

Make acidic build the top level transactions that are written to the logs.

Scotty

Two different contexts: ScottyM and ActionM.

ScottyM is for configuring scotty (routes, middleware).

ActionM is for parsing the request and producing a response.

These contexts, like IO have a Monad interface so we reuse do notation with them.

Able to lift IO actions into either context with liftIO.

Http Conduit

Lots more to conduit that we see here.

Implements a streaming interface so that you can stream to disk / other sockets.

Works well for simple things too!

Blaze Html

Yet another monad: sequences HTML elements together. Reuse of do notation, forM, etc is very handy.

Everything is guaranteed to be escaped.

Bit verbose, though. Hamlet is a quasiquoter built on top that can help.

Subscriber.hs

  • STM.TVars
  • Websockets

TVars

Container that has a value. Value can be replaced.

Part of STM (Software Transactional Memory).

Build up STM actions and run the action with atomically to execute it.

Anything that has the var can atomically read/write its value.

We use this for sharing an in memory database of websocket connections.

Websockets

Set an intercept handler for new websocket connections in wai.

We create an IO action that handles all comms for that socket.

IO action runs forever until the connection disconnects.

Upon disconnect, the IO action will remove the connection from the shared state.

Distribute pushes payload down each connection.

Hubbub Library Structure

Library

src/Network/Hubbub.hs                       # Top API (client talks to this).
src/Network/Hubbub/Http.hs                  # Http calls to the Pubs and Subs.
src/Network/Hubbub/Hmac.hs                  # Hmac signs bits of text
src/Network/Hubbub/Queue.hs                 # Handles queuing of events.
src/Network/Hubbub/Internal.hs              # Actions events.
src/Network/Hubbub/SubscriptionDb.hs        # Generics Api to persist Subs.
src/Network/Hubbub/SubscriptionDb/Acid.hs   # AcidState impl of DbApi
src/Network/Hubbub/SubscriptionDb/SqLite.hs # SQLite impl of DbApi.

Test Structure

test/Test.hs                                      # Main for tests
test/Network/Hubbub/TestHelpers.hs                # Various helper funcs
test/Network/Hubbub/Test.hs                       # Test for Hubbub.hs
test/Network/Hubbub/Queue/Test.hs                 # Test for Queue.hs
test/Network/Hubbub/Http/Test.hs                  # etc.
test/Network/Hubbub/Hmac/Test.hs
test/Network/Hubbub/SubscriptionDb/Test.hs
test/Network/Hubbub/SubscriptionDb/Acid/Test.hs
test/Network/Hubbub/SubscriptionDb/SqLite/Test.hs
test/Network/Hubbub/Internal/Test.hs

SubscriptionDb.hs

  • Hubbub DB Interface

Datatypes

Holds everything about a subscription that we need to distribute content to it and expire subscriptions.

Newtypes so that we cannot mix things up.

Having a type for HttpResource is a bit pedantic. Saves having a string (which may not be a URL) or having a Network.URL (which may be relative or not for HTTP) causing issues later.

Interface

Just defines all of the functions we need for the API.

Just a datatype with record accessors. Zero magic.

EitherT can be an error or a good result. Encourages catching exceptions in the impl.

SubscriptionDb/SqLite.hs

  • SQLite Simple

To/From Row/Field Instances

toRow goes from a thing to a list of fields.

fromRow and fromField return RowParsers.

A row parser is a sequential thing that steps through the columns of a result set and returns whether it found what it was looking for or not.

RowParser is a Monad and an Applicative.

Use applicative <$> to lift constructor into RowParser and apply to many RowParsers.

DbApi Implementation

Currying works awesomely for filling in the SqLite specific bit.

There is an IsString instance for Query, thus string literal are Querys.

fmap (fmap func) is to fmap IO then fmap the list inside.

ToRow is used to convert params into columns. Only is to take a single thing that has a toField and make a toRow out of it.

SubscriptionDb/SqLite/Test.hs

  • Tasty
  • HUnit

Tasty

Tasty provides a way to group tests into trees.

Allows multiple ways to test (HUnit,Quickcheck,Smallcheck)

Test tree from this file is imported by SubscriptionDb/Test.hs.

Test tree from this SubscriptionDb/Test.hs is imported by test/Test.hs.

HUnit

Deals with making assertions. Throws exceptions if not met.

Assertion is just IO () so can do anything in tests.

assertFail and @?= are enough for my use.

SubscriptionDb/Acid.hs

  • Final note on AcidState

AcidState Implementation

Combining multiple other transactions in expireSubscriptions, using do notition and mapM_ monad machinery to help.

DbApi implementation just takes advantage of currying once again.

Queue.hs

  • Definition of Hubbub events
  • STM.TQueue
  • Automatic Retries

Events & Retryable Class

Datatypes for Subscription,Publication and Distribution Events

Retryable just gives a polymorphic way of extracting/incrementing the attempt count.

Also allows retry delay to be implemented differently depending on the type.

Allows the queue to work on all Retryable events without sub typing.

Queue Loop

Takes an event actioner, error handler and a queue of retryables.

Atomically dequeues (or blocks waiting for an event) from the queue.

Requeues failed events based on the retry delay.

IO Action never returns. This is actually not a problem!

Queue Tests

Because all the IO is lazy, we can test a lot here without too much boilerplate.

forkIO is a way to run any IO action in a different green thread.

Mutable vars let us collect the state from the other threads.

And this is important, because any code that needs a sleep 100ms in a test just to pass needs a lot of testing! ;)

Http.hs

  • Get remote resource of a PublishEvent
  • Distribute content of a DistributionEvent
  • Verify a subscription with the subscriber

Get Resource

Gets the content type and body of the topic of a publication event.

httpResourceToRequest . fromTopic . publicationTopic $ ev
-- is the same as
httpResourceToRequest (fromTopic  (publicationTopic ev))

&&& ("fanout") is from Arrow that applies both functions to the response and returns a tuple.

contentType = fmap snd . find ((== hContentType) . fst) . responseHeaders
-- is the same as
contentType r = fmap snd $ find ((== hContentType) . fst) $ responseHeaders r

Distribute

Just assembles the post request from the event. Nothing fancy.

Verify Subscription

Pattern matching on constructor to get mode.

Takes a random number generator as a param.

Needs to keep query params from original callback, so filter out hub ones from input.

Generates random challenge and checks response.

handleT from the Errors library to recover from NotFound.

Tests

There are tests. Using scotty as a test server for the http calls.

Lots of details for scotty stuff we've seen before.

We'll skip them in the talk but they are on github to peek at.

Internal.hs

  • Making the event actioners

Largely boring. Just gluing http calls together into event handlers.

We'll skip them unless we have time / interest.

Hubbub.hs

  • Where it all comes together

Config & Environment

Config for client to configure us.

Env for all of the state our code needs to run (TQueues, DbApi,etc)

Constructor or accessors of HubbubEnv are not exported, so clients can't fiddle with it or inspect it.

Public Functions

Initializer for either AcidState or SQLite

subscribe / publish just push onto the queues.

List just reads from the DbApi

All the real work happens asynchronously by the worker threads.

Initialize Hubbub

Where the IO actions really become awesome.

Can just fork as many of the loops as we like.

They do their stuff until the main thread exits.

This part is ridiculously easy.

Main.hs

  • Scotty app that calls the hubbub api.

The App

Wonderfully boring code.

None of the asynch bleeds into here.

Just really about parsing requests and giving outputs, as it should be.

Wrap up

  • Clearing up the crass lie in the title.
  • Haskell IO is actually awesome!
  • Good Resources for the Curious
  • Questions

Totally lied in the title

This code was pretty much all impure.

Was that actually a big deal?

Except for the type sigs, did you even notice?

Title should be: Haskell can do real world real stuff just as well as your favourite language!

Haskell IO is actually awesome

I'd argue that we're actually better off in haskell.

We just did asynch, non blocking code and it didn't even hurt (or involve Node.js).

Laziness is a virtue (forever $ doStuff)

Furthermore the fact that IO is a value gives rise to lots of cool combinators (mapM,replicateM,etc) that are reusable for other sequencey type things.

Core message

Don't listen to the FUD: Haskell can do real world things very well.

Just look at how warp beats the pants off of node/python/etc http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchmarks.

Resources for the curious

This code: https://github.com/benkolera/haskell-hubbub/

Search packages on hackage: http://hackage.haskell.org/

Read LYAH: http://learnyouahaskell.com/

Hoogle: http://www.haskell.org/hoogle/

GHCi: run "cabal repl" in your project to get a repl. :t and :i do awesome things.

Questions?

  • pureandsimple.benkolera.com
  • @benkolera
  • bkolera@irc.freenode.net#bpfg
  • ben.kolera@gmail.com