Realtime RESTful – Have your REST and push it too – origins



Realtime RESTful – Have your REST and push it too – origins

0 0


realtime-restful

Slides for O'Reilly Architecture Conference 2015 talk

On Github dgouldin / realtime-restful

Realtime RESTful

Have your REST and push it too

by David Gouldin / @dgouldin

realtime-restful.herokuapp.com

origins

  • Before we dig in, let me provide some context around the problem we were trying to solve at Heroku that led to this talk.

heroku connect

  • data sync product with a dashboard web UI
  • gigantic state machine
  • lots of state changes happen async on the backend
  • customer trust of the product relies on communication/transparency

polling wasn't cutting it

  • constant feedback from customers: I can't tell what's happening
  • inefficient API usage (lots of no-op calls)
  • scaling problems (needed to scale up an order of magnitude)

defining the problem

  • we had a REST API
  • we needed to add a realtime channel
  • we didn't want to drastically increase client complexity

let's talk about

the web

  • Now that you have some understanding of the problem we were trying to solve, let's pull back a bit and talk about the web and the direction it's going.

the web is

service oriented

we've moved from this

  • Remember the days when you could describe your stack with an acronym?
  • 1 data store, usually relational
  • request/response cycle ecapsulated all business logic
  • server-side views responsible for all content rendering

to this

er, this

  • A single service is potentially comprised of many processes
  • Services (mostly) talk HTTP to each other
  • Pubsub (usually via redis) allows efficient 1:many communication between services
  • MOST IMPORTANTLY: the web client is now just another service, a consumer of public services published by the server

the shift was gradual

  • First we published APIs for others ("platform play")
  • Then we began to consume those same APIs in native clients on other platforms (the rise of mobile)
  • Finally, we figured out how to effectively turn our web clients into service consumers as well (JS templating, client-side MVC, web components)
  • TL;DR: We've had plenty of time to understand and develop best practices around the way machines talk http to each other

best practices

  • This is part of the table of contents for Heroku's HTTP API design document.
  • There's all kinds of stuff in here to help you build a great API.

platform play is table stakes

  • When we think about our own web applications as a network of services, the previously novel idea of "platform" becomes a given.
  • A public REST API service is no longer a bonus. It's now a core part of our application's architecture.

the web is

realtime

responsiveness is an expectation

  • If your site looks like an application, users expect it to act like one
  • Fallacies of distributed computing mean nothing to users.

it's not as hard as it used to be

http://caniuse.com/#feat=websockets
  • In the past, the only way to push reliably was by supporting multiple transports (long-polling, flash sockets, websockets, etc)
  • Now, support for proper websockets is good enough to rely on by itself.

just because it's easy, doesn't mean it's easy to do right

  • Think of a product you've built. Now think how a realtime streaming transport design might look for that product.
  • If you're like me, you immediately begin to think in terms of events and data payloads associated with event types.
  • Then you begin to think about the process of reacting to events in the inteface. How event publishing would propagate through client views and state.
  • Head hurting yet? Mine is.

realtime transport design is not a solved problem

  • I'd consider a technology "mature" when its usage patterns are well estalished and understood. The realtime web just isn't there yet.

the web is

… complicated

platform 🆚 realtime

  • You should probably be doing both
  • Commonly solved orthogonally, meaning:
  • they easily go out of sync
  • they have human resource contention
  • 2x surface area for errors

common realtime solution

  • Server side opt-in per event
  • Client side opt-in per event per component
  • That's a lot of code for each supported event!

reframing the problem

  • We've outlined a bit of the mess we're in with regard to web app architecture. Before we dig into solutions, let's take a moment to reframe the problem.

events are a proxy for state change

  • "X happened" is actually shorthand for "A, B, and C objects have been mutated".
  • If instead of events, we simply enumerated all state mutations (including Nth degrees like aggregates), the event itself would be useless.
  • The reason we tend to think in terms of events is because it makes more sense to our causal-driven brains. Computers care about data flow, not causality.
  • This creates an implicit contract based on derivation of cause-effect relationships in our code. That's why it's so difficult to understand, maintain, and untangle!

isolating data mutation is key

  • If we agree that state change is what we're targeting, all we need is a way to know about all operations which change state.
  • Once we isolate mutation of data, we have the all the hooks necessary to enumerate mutations explicitly.
  • Our streaming contract can then become both comprehensible and maintainable.
  • No more spooky action at a distance where the name of an event type means any number of side-effects to our client interface.

REST endpoints fully describe application state

  • If this wasn't true, you couldn't have built your app with just the API.
  • This API is an existing and complete contract that already has a client consumer.
  • Both of these make it the perfect choice for a realtime streaming contract as well.

a better way

  • You're probably able to guess at what all this is leading to. There's a better way to solve our problem of adding a realtime transport without making our client app too complex.

let's review

  • pubsub channel per user
  • user authenticates to stream producer with a private key which is passed on to an identity service for verification
  • stream producer subscribes to the user-specific pubsub channel
  • events are published to that channel and flow down to the client
  • the client has logic to deal with each event type

realtime producer as REST consumer

  • pubsub channel per endpoint
  • client asks stream producer to subscribe to REST endpoints on its behalf
  • On a state change event, all relevant endpoint channels are published to
  • those REST payloads are tunneled to the client via the stream producer
  • the client deals with them the same way it would had it requested the payload via AJAX

challenges

  • No solution is a panacea, and no analogy is perfect.
  • Let's look at some of the challenges we faced when implementing this techique.

isolating data mutation

  • This is simple if you have 1 source of truth (rdbms) and 1 API service which writes to it:
  • Most ORMs have hooks after saving a model instance to the database. Register a global post-save hook and you've isolated all data mutation.
  • Otherwise, you need a data service which all other services interact with to mutate data.

authentication

  • If we no longer have a channel per user, how do we verify that the endpoint the client is asking to subscribe to is allowed?
  • Our REST API already has auth built in, so let it solve the problem!
  • 2 common client auth mechanisms: session cookie or access token
  • cient passes auth along to stream produer, producer issues a HEAD request to REST API
  • If 200, then valid subscribe request

what API endpoints changed?

  • The hardest part: requires active participation on the part of the REST API
  • Registry of model : endpoints needed, describing how to get from a model instance to a the endpoint url
  • Once that registry has been created, you simply iterate through the the associated endpoint generators for a model type and render the REST paths. You end up with a full list of modified API endpoints.

performance

  • One model instance can affect many endpoints. Obviously this becomes expensive to compute if the data changes frequently.
  • Computing just the endpoint paths is cheap. Then check to ensure an endpoint has a subscriber before rendering its payload.
  • Push the endpoint rendering off to a work queue if it's too expensive to do synchronously.

server implementation

let's look at some code

  • Django ORM hooks
  • URL Registry
  • Node.js socket server

the client

react.js

a digression

why react?

  • this technique can be used with any client framework
  • in fact, one of its strengths is its ability to plug into existing infrastructure
  • I believe react.js (along with flux and immutable js) is the ideal choice of client tech for this purpose

built for data mutation over time

  • react handles propagation of state down through hierarchies of components extremely well
  • because react effectively re-renders your entire component on every state change, you know your UI always accurately reflects state

the shadow dom

  • react's super-power
  • makes it not only feasible but FAST to re-render components every single time

server-side rendering

  • not directly applicable here, but obvious for any api consuming client
  • drastically reduces time to usable page load
  • requires very little code to implement (provided the vast majority of your markup is in react components)

uni-directional data flow

  • this is the big one for our purposes
  • think of the complexities 2-way data binding would introduce to an app whose state is being pushed to it
  • uni-directional data flow guarantees that all data coming into your app, no matter the source, will be accurately represented in your UI. It's just that simple.

flux architecture

  • react is fairly common, so I didn't want to spend a lot of time going over the basics
  • flux is less generally understood, and it's more important to my demonstration, so let's review its basics

view

  • essentially react components, but in flux architectures, views are commonly rendered hierarchically
  • a parent view receives events from a store (explanation forthcoming) and calls its own setState to re-render itself
  • the entire state is then passed down to child views so that ui changes naturally flow uni-directionally down the dependency chanin

store

  • a data store, similar to a model, but think of a store as a table, not a row
  • stores register themselves with dispatchers (explanation forthcoming) so that when data changes, they are notified
  • once the store is updated, it emits an event that is commonly used by views (components) to re-render themselves

dispatcher

  • Like the name sounds, the dispatcher is responsible for invoking callbacks registered by other components.
  • When an action (explanation forthcoming) receives new data, it calls the dispatch method
  • Since stores commonly register themselves with the dispatcher, their callback is called when the dispatch method is invoked

actions

  • in generic terms, an action's job is to provide data to the dispatcher
  • practically speaking, you can think of it as your REST API client
  • this is part of what makes flux such a natural fit for our realtime channel: all we need to do is get data to our actions
  • Actions may be called in response to a view's event handler. This is how react avoids 2-way data binding without sacrificing user interactivity.

client implementation

more code diving

  • items store -> crud store -> base store
  • items actions -> crud actions -> base actions
  • items views
  • api subscription service

live demo time

  • page initial render
  • curl API PUT, POST, DELETE

drawbacks

http method is a stretched analogy

  • it works (mostly), but the methods are meant for requests, not responses
  • you can begin to see it break down with PUT vs POST, and it further degenerates when you begin to think about PATCH
  • I've presented a purist view of this technique, but there are some practical things that can be done to shore up the analogy.

clients must maintain a list of subscriptions

  • this list can get fairly large, and without sticky sessions, any socket disconnect means resubscribing to all channels
  • it's somewhat difficult to answer the question "how do I know when there's something new I should subscribe to?". Ideally, list endpoints would act like a PATCH for new members (again stretching the http method analogy).

beware the race conditions

  • always use a "last modified" timestamp in the data store
  • always inspect the client timestamp against the one in the socket payload to avoid overwriting with older state

tl;dl

realtime API mirrors REST API

client consumer doesn't care about push vs pull

data service triggers the publishing of REST endpoints

Questions?