Realtime RESTful

Have your REST and push it too

by David Gouldin / @dgouldin

realtime-restful.herokuapp.com

origins

Before we dig in, let me provide some context around the problem we were trying to solve at Heroku that led to this talk.

heroku connect

data sync product with a dashboard web UI
gigantic state machine
lots of state changes happen async on the backend
customer trust of the product relies on communication/transparency

polling wasn't cutting it

constant feedback from customers: I can't tell what's happening
inefficient API usage (lots of no-op calls)
scaling problems (needed to scale up an order of magnitude)

defining the problem

we had a REST API
we needed to add a realtime channel
we didn't want to drastically increase client complexity

let's talk about

the web

Now that you have some understanding of the problem we were trying to solve, let's pull back a bit and talk about the web and the direction it's going.

the web is

service oriented

we've moved from this

Remember the days when you could describe your stack with an acronym?
1 data store, usually relational
request/response cycle ecapsulated all business logic
server-side views responsible for all content rendering

to this

er, this

A single service is potentially comprised of many processes
Services (mostly) talk HTTP to each other
Pubsub (usually via redis) allows efficient 1:many communication between services
MOST IMPORTANTLY: the web client is now just another service, a consumer of public services published by the server

the shift was gradual

First we published APIs for others ("platform play")
Then we began to consume those same APIs in native clients on other platforms (the rise of mobile)
Finally, we figured out how to effectively turn our web clients into service consumers as well (JS templating, client-side MVC, web components)
TL;DR: We've had plenty of time to understand and develop best practices around the way machines talk http to each other

best practices

This is part of the table of contents for Heroku's HTTP API design document.
There's all kinds of stuff in here to help you build a great API.

platform play is table stakes

When we think about our own web applications as a network of services, the previously novel idea of "platform" becomes a given.
A public REST API service is no longer a bonus. It's now a core part of our application's architecture.

the web is

realtime

responsiveness is an expectation

If your site looks like an application, users expect it to act like one
Fallacies of distributed computing mean nothing to users.

it's not as hard as it used to be

http://caniuse.com/#feat=websockets

In the past, the only way to push reliably was by supporting multiple transports (long-polling, flash sockets, websockets, etc)
Now, support for proper websockets is good enough to rely on by itself.

just because it's easy, doesn't mean it's easy to do right

Think of a product you've built. Now think how a realtime streaming transport design might look for that product.
If you're like me, you immediately begin to think in terms of events and data payloads associated with event types.
Then you begin to think about the process of reacting to events in the inteface. How event publishing would propagate through client views and state.
Head hurting yet? Mine is.

realtime transport design is not a solved problem

I'd consider a technology "mature" when its usage patterns are well estalished and understood. The realtime web just isn't there yet.

the web is

… complicated

platform 🆚 realtime

You should probably be doing both
Commonly solved orthogonally, meaning:
they easily go out of sync
they have human resource contention
2x surface area for errors

common realtime solution

Server side opt-in per event
Client side opt-in per event per component
That's a lot of code for each supported event!

reframing the problem

We've outlined a bit of the mess we're in with regard to web app architecture. Before we dig into solutions, let's take a moment to reframe the problem.

events are a proxy for state change

"X happened" is actually shorthand for "A, B, and C objects have been mutated".
If instead of events, we simply enumerated all state mutations (including Nth degrees like aggregates), the event itself would be useless.
The reason we tend to think in terms of events is because it makes more sense to our causal-driven brains. Computers care about data flow, not causality.
This creates an implicit contract based on derivation of cause-effect relationships in our code. That's why it's so difficult to understand, maintain, and untangle!

isolating data mutation is key

If we agree that state change is what we're targeting, all we need is a way to know about all operations which change state.
Once we isolate mutation of data, we have the all the hooks necessary to enumerate mutations explicitly.
Our streaming contract can then become both comprehensible and maintainable.
No more spooky action at a distance where the name of an event type means any number of side-effects to our client interface.

REST endpoints fully describe application state

If this wasn't true, you couldn't have built your app with just the API.
This API is an existing and complete contract that already has a client consumer.
Both of these make it the perfect choice for a realtime streaming contract as well.

a better way

You're probably able to guess at what all this is leading to. There's a better way to solve our problem of adding a realtime transport without making our client app too complex.

let's review

pubsub channel per user
user authenticates to stream producer with a private key which is passed on to an identity service for verification
stream producer subscribes to the user-specific pubsub channel
events are published to that channel and flow down to the client
the client has logic to deal with each event type

realtime producer as REST consumer

pubsub channel per endpoint
client asks stream producer to subscribe to REST endpoints on its behalf
On a state change event, all relevant endpoint channels are published to
those REST payloads are tunneled to the client via the stream producer
the client deals with them the same way it would had it requested the payload via AJAX

challenges

No solution is a panacea, and no analogy is perfect.
Let's look at some of the challenges we faced when implementing this techique.

isolating data mutation

This is simple if you have 1 source of truth (rdbms) and 1 API service which writes to it:
Most ORMs have hooks after saving a model instance to the database. Register a global post-save hook and you've isolated all data mutation.
Otherwise, you need a data service which all other services interact with to mutate data.

authentication

If we no longer have a channel per user, how do we verify that the endpoint the client is asking to subscribe to is allowed?
Our REST API already has auth built in, so let it solve the problem!
2 common client auth mechanisms: session cookie or access token
cient passes auth along to stream produer, producer issues a HEAD request to REST API
If 200, then valid subscribe request

what API endpoints changed?

The hardest part: requires active participation on the part of the REST API
Registry of model : endpoints needed, describing how to get from a model instance to a the endpoint url
Once that registry has been created, you simply iterate through the the associated endpoint generators for a model type and render the REST paths. You end up with a full list of modified API endpoints.

performance

One model instance can affect many endpoints. Obviously this becomes expensive to compute if the data changes frequently.
Computing just the endpoint paths is cheap. Then check to ensure an endpoint has a subscriber before rendering its payload.
Push the endpoint rendering off to a work queue if it's too expensive to do synchronously.

server implementation

let's look at some code

Django ORM hooks
URL Registry
Node.js socket server

the client

react.js

a digression

why react?

this technique can be used with any client framework
in fact, one of its strengths is its ability to plug into existing infrastructure
I believe react.js (along with flux and immutable js) is the ideal choice of client tech for this purpose

built for data mutation over time

react handles propagation of state down through hierarchies of components extremely well
because react effectively re-renders your entire component on every state change, you know your UI always accurately reflects state

the shadow dom

react's super-power
makes it not only feasible but FAST to re-render components every single time

server-side rendering

not directly applicable here, but obvious for any api consuming client
drastically reduces time to usable page load
requires very little code to implement (provided the vast majority of your markup is in react components)

uni-directional data flow

this is the big one for our purposes
think of the complexities 2-way data binding would introduce to an app whose state is being pushed to it
uni-directional data flow guarantees that all data coming into your app, no matter the source, will be accurately represented in your UI. It's just that simple.

flux architecture

react is fairly common, so I didn't want to spend a lot of time going over the basics
flux is less generally understood, and it's more important to my demonstration, so let's review its basics

view

essentially react components, but in flux architectures, views are commonly rendered hierarchically
a parent view receives events from a store (explanation forthcoming) and calls its own setState to re-render itself
the entire state is then passed down to child views so that ui changes naturally flow uni-directionally down the dependency chanin

store

a data store, similar to a model, but think of a store as a table, not a row
stores register themselves with dispatchers (explanation forthcoming) so that when data changes, they are notified
once the store is updated, it emits an event that is commonly used by views (components) to re-render themselves

dispatcher

Like the name sounds, the dispatcher is responsible for invoking callbacks registered by other components.
When an action (explanation forthcoming) receives new data, it calls the dispatch method
Since stores commonly register themselves with the dispatcher, their callback is called when the dispatch method is invoked

actions

in generic terms, an action's job is to provide data to the dispatcher
practically speaking, you can think of it as your REST API client
this is part of what makes flux such a natural fit for our realtime channel: all we need to do is get data to our actions
Actions may be called in response to a view's event handler. This is how react avoids 2-way data binding without sacrificing user interactivity.

client implementation

more code diving

items store -> crud store -> base store
items actions -> crud actions -> base actions
items views
api subscription service

live demo time

page initial render
curl API PUT, POST, DELETE

drawbacks

http method is a stretched analogy

it works (mostly), but the methods are meant for requests, not responses
you can begin to see it break down with PUT vs POST, and it further degenerates when you begin to think about PATCH
I've presented a purist view of this technique, but there are some practical things that can be done to shore up the analogy.

clients must maintain a list of subscriptions

this list can get fairly large, and without sticky sessions, any socket disconnect means resubscribing to all channels
it's somewhat difficult to answer the question "how do I know when there's something new I should subscribe to?". Ideally, list endpoints would act like a PATCH for new members (again stretching the http method analogy).

beware the race conditions

always use a "last modified" timestamp in the data store
always inspect the client timestamp against the one in the socket payload to avoid overwriting with older state

Realtime RESTful – Have your REST and push it too – origins

dgouldin

Realtime RESTful – Have your REST and push it too – origins

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

realtime-restful

Realtime RESTful

Have your REST and push it too

origins

heroku connect

polling wasn't cutting it

defining the problem

let's talk about

the web

the web is

service oriented

we've moved from this

to this

er, this

the shift was gradual

best practices

platform play is table stakes

the web is

realtime

responsiveness is an expectation

it's not as hard as it used to be

just because it's easy, doesn't mean it's easy to do right

realtime transport design is not a solved problem

the web is

… complicated

platform 🆚 realtime

common realtime solution

reframing the problem

events are a proxy for state change

isolating data mutation is key

REST endpoints fully describe application state

a better way

let's review

realtime producer as REST consumer

challenges

isolating data mutation

authentication

what API endpoints changed?

performance

server implementation

let's look at some code

the client

react.js

a digression

why react?

built for data mutation over time

the shadow dom

server-side rendering

uni-directional data flow

flux architecture

view

store

dispatcher

actions

client implementation

more code diving

live demo time

drawbacks

http method is a stretched analogy

clients must maintain a list of subscriptions

beware the race conditions

tl;dl

realtime API mirrors REST API

client consumer doesn't care about push vs pull

data service triggers the publishing of REST endpoints

Questions?

0 0