Reactive Systems – Overview



Reactive Systems – Overview

2 1


Reactive-Programming

'Reactive Systems - Overview' presentation slides

On Github gpolito / Reactive-Programming

Reactive Systems

Overview

Guillermo Polito

@gpolito / slides at github

Why Reactive?

internetlivestats.com

Why Reactive?

Quick summary. In 10 years:

2005 2015 Internet users 1000 millions 3000 millions YouTube was born on February, 2005 more than 1000 million users Facebook 5.5 million active users 1.380 million active users Twitter wasn't born yet 288 million active users - primer columna primero, luego la segunda

Why Reactive?

Some things have changed during this years...

Almost yesterday Today Server nodes 10's 1000's RAM expensive cheap Network slow fast Data volume GBs TBs -> PBs Response times seconds milliseconds Maintenance downtimes hours none - hardware - user requirements

Why Reactive?

Today's demands are simply not met by yesterday’s software architectures

A new coherent approach to Systems Architecture is needed...

lo q vimos hasta ahora fue un pantallazo de los numeros de la web hoy en dia, en donde el tráfico es impresionante. Vimos q el hardware ha mejorado y abaratado. Los requerimientos de los usuarios son muy exigentes. Con las arquitecuras q conocemos hoy, sobre todo las MVC q utilizan un modelo de concurrencia basado en threads con mem compartida, se nos es muy complicado o casi imposible poder hacer frente a estos requerimientos. Por lo general se utilizan mecanismos muy complicados para poder emparchar esta situación.

Reactive Systems

Traits

reactivemanifesto.org Organisations working in disparate domains are independently discovering patterns for building software that look the same. These systems are more robust, more resilient, more flexible and better positioned to meet modern demands. These changes are happening because application requirements have changed dramatically in recent years. It is time to apply these design principles consciously from the start instead of rediscovering them each time.

Responsive

“A responsive system is quick to react to all users — under blue skies and grey skies — in order to ensure a consistently positive user experience.”

Responsive Systems

offer:

  • rapid and consistent response times
  • reliable upper bounds to deliver a consistent QoS

should:

  • detect problems quickly and dealt with them effectively

Reactive Systems

Traits

reactivemanifesto.org

Resilient

“The ability of a sustance or object to spring back into shape.” “The capacity to recover quickly from difficulties.”

Merriam Webster

Merriam–Webster Inc. de Springfield, Massachusetts, es una editorial estadounidense que publica libros de referencia, sobre todo diccionarios, que tienen su origen en el diccionario An American Dictionary of the English Language, de Noah Webster, publicado a su vez en 1828.

Resilient

The system should be responsive in the face of failure.

Failure != Error

Examples of failures:

  • program defects causing corrupted internal state
  • hardware malfunction
  • network failures
  • troubles with external services
A failure is an unexpected event within a service that prevents it from continuing to function normally. A failure will generally prevent responses to the current, and possibly all following, client requests. This is in contrast with an error, which is an expected and coded-for condition—for example an error discovered during input validation, that will be communicated to the client as part of the normal processing of the message.

Resilient - how?

Failure Recovery in OOP

Resilient - how?

Failure Recovery in OOP

  • single thread of control
  • if the thread blows up => you are screwed
  • no global organization of error handling
  • defensive programming tangled with business logic, scattered around the code

Resilience is by design

ok... but how?

Resilience is by design - How?

containment delegation isolation

let's review this concepts...

Cómo vamos a alcanzar la resiliencia? Contando con compartimientos que contengan a las fallas, y q aislen a los componentes entre si, permitiendo que partes del sistema puedan caerse y recuperarse sin comprometer al sistema entero. La recuperación de cada componente es delegado a otro componente externo. El cliente no es responsable de manejar las fallas. La alta disponibilidad es asegurada replicando componentes donde sea necesario.

Resilience is by design - How?

containment:=> bulkheads=> circuit breaker delegation=> supervision

Resilience is by design - How?

isolation (decoupling, both in time and space) - in time: sender and receiver don't need to be present at the same time for communicate => message-driven arquitecture - in space (defined as Location Transparency): the sender and receiver don't have to run in the same process, and this might change during application's lifetime => message-driven arquitecture

Bulkheads

+

Supervison

una vez q sabemos las propiedades q tenemos q tener en el diseño para lograr la resiliencia, vamos a revisar cada una de éstas.

Bulkheads help us to:

isolate the failure compartmentalize manage failure locally avoid cascading failures

this concept should be used together with supervison!

Supervison

Supervison

Core Concept

Supervision means that normal requests and responses (including negative ones such as validation errors) flow separately from failures: while the former are exchanged between the user and the service, the latter travel from the service to its supervisor.

Supervisor hierarchies with Actors

Supervisor hierarchies with Actors

Supervisor hierarchies with Actors

Configuration in Akka

import akka.actor.OneForOneStrategy
import akka.actor.SupervisorStrategy._
import scala.concurrent.duration._
 
override val supervisorStrategy =
  OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
    case _: ArithmeticException      => Resume
    case _: NullPointerException     => Restart
    case _: IllegalArgumentException => Stop
    case _: Exception                => Escalate
  }
					

Circuit Breaker

Circuit Breaker

PROBLEM - Example Situation

  • web app interacting with a remote WS
  • remote WS is overloaded, and its DB takes a long time to respond with a fail
  • => WS calls fail after a long period of time
  • => web users noticed that form submissions takes time to complete
  • => web users start to click the refresh button adding more requests!
  • => web app fails due to resource exhaustion, affecting all users across the site

failures in external dependencies shouldn' t bring down an entire app

Circuit Breaker

Solution Proposal

monitor response times if time consistently rises above a threshold, either: fail fast approach, or... route following requests to an alternative service monitor the original service if the service is back in shape, restore to the original state if not, continue with the same approach

Circuit Breaker

Akka Implementation

  • implements fail fast approach
  • provide stability
  • prevent cascading failures in distributed systems

Circuit Breaker

Akka Implementation

configuration

  • max. number of failures
  • call timeout: response time threshold
  • reset timeout: we'll see this later

Circuit Breaker

Akka Implementation

Closed state (normal operation)

  • Exceptions or calls exceeding the configured callTimeout increment a failure counter
  • Successes reset the failure count to 0 (zero)
  • When the failure counter reaches a maxFailures count, the breaker is tripped into Open State

Circuit Breaker

Akka Implementation

Open State

  • All calls fail-fast with a CircuitBreakerOpenException
  • After the configured resetTimeout, the circuit breaker enters a Half-Open State

Circuit Breaker

Akka Implementation

Half-Open State

  • The first call attempted is allowed through without failing fast
  • All other calls fail-fast with an exception just as in Open state
  • If the first call succeeds, the breaker is reset back to Closed state
  • If the first call fails, the breaker is tripped again into the Open state for another full resetTimeout

Circuit Breaker

Akka Implementation

import akka.pattern.CircuitBreaker
 
def dangerousCall: String = "This really isn't that dangerous"

val breaker =
  CircuitBreaker(system.scheduler, 
    maxFailures = 5, 
    callTimeout = 10.seconds, 
    resetTimeout = 1.minute)
 
def dangerous: Future[String] = 
	breaker.withCircuitBreaker(Future(dangerousCall))
 
					

Back Pressure

Problem Description

speed(publisher) > speed(suscriber)

when a component is under stress it might fail catastrophically or drop messages in an uncontrolled fashion. To be more resilient, the system as a whole needs to react to avoid this kind of situations

bounded buffer => drop messages + require re-sending

unbounded buffer => buffer overflow

Back Pressure - Solution

speed(publisher) < speed(suscriber)

Back pressure: the component should communicate that is under stress to upstream components, so they can start reducing the load

Back Pressure

this mechanism is included into the Reactive Stream specification

http://reactive-streams.org/

Reactive Streams

Stream processing on the JVM:

  • Asynchronous
  • Back-pressured
  • Standarized

Available implementations:

  • Akka Streams
  • Reactor Composable (DSL for groovy, clojure)
  • RxJava
  • Ratpack

Reactive Streams

Akka Streams Example

implicit val system = ActorSystem("Sys")
val mat = FLowMaterializer(...)

Flow(text.split("" "").toVector).
    map(word => word.toUpperCase).
    foreach(transformed => println(transformed)).
    onComplete(mat) {
        case Success(_) => ...; system.shutdown();
        case Failure(e) => ...; system.shutdown();
    }
					

Reactive Systems

Traits

reactivemanifesto.org

Elasticity

Scalability

+

automatic resource management

slidesLanguageService.setLang(Lang.SPANGLISH)

Scalability - Qué significa? - (1)

la performance del sistema

es proporcional

a los recursos reservados

Scalability - Qué significa? - (2)

Scale OUT & IN (horizontal)

+

Scale UP & DOWN (vertical)

Scalability

Por qué es necesario?

  • por ejemplo: en el dominio eCommerce:

- los picos de tráfico mas altos se dan cuando cuando estás vendiendo bien

- durante un pico de tráfico la gente basicamente quiere darte su dinero :)

Cómo lo logramos?

  • Never Block -> Go Async
  • Share Nothing -> Use Immutable Data
  • Location Transparency

Never Block -> Go Async

Share Nothing -> Use Immutable Data

Location Transparency

Scale OUT = Scale UP

- la transparencia de localización permite el desacoplamiento a nivel del espacio - dos componentes se pueden comunicar si conocer donde estan ubicados. - permite adaptar la topologia del sistema dependiendo del uso - es un concepto clave para poder escalar bajo demanda - como estamos haciendo computación distribuida, no hay diferencia conceptual entre comunicarse con un componente en el mismo nodo o en otro nodo del cluster

la Elasticidad

requiere de una

Message-driven Arquitecture

Reactive Systems

Traits

reactivemanifesto.org

Message Driven Arquitecture

Message Driven

Por qué es importante?

la comunicación asincrónica basada en pasaje de mensajes otorga/permite:

  • bajo acoplamiento
  • aislamiento
  • location transparency
  • mantenibilidad y flexibilidad para evolucionar - interfaces enfocadas en el contenido de la comunicación
  • lower latency & higher throughput
  • scale up & out
  • load management and flow control - a través del monitoreo de colas de mensaje y aplicando back-pressure

Reactive Systems

Traits

reactivemanifesto.org

Qué posibles tecnologías cumplen con todos estos requerimientos?

Feature Akka containment (bulkheads) yes (actors) asynchronous & non blocking message-passing yes fault tolerance yes (supervision) back-pressure yes (by using Akka Streams) circuit breaker yes location transparency yes (ActorRef concept)

Referencias

reactivemanifesto.org

Reactive Design Patterns - Roland Kuhn and Jamie Allen

Kevin Webber - What is Reactive Programming?

Jonas Boner - Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems

internetlivestats.com

Reactive Stream Processing with Akka Streams

reactivestreams.org