Why Reactive?
Quick summary. In 10 years:
2005
2015
Internet users
1000 millions
3000 millions
YouTube
was born on February, 2005
more than 1000 million users
Facebook
5.5 million active users
1.380 million active users
Twitter
wasn't born yet
288 million active users
- primer columna primero, luego la segunda
Why Reactive?
Some things have changed during this years...
Almost yesterday
Today
Server nodes
10's
1000's
RAM
expensive
cheap
Network
slow
fast
Data volume
GBs
TBs -> PBs
Response times
seconds
milliseconds
Maintenance downtimes
hours
none
- hardware
- user requirements
Why Reactive?
Today's demands are simply not met by yesterday’s software architectures
A new coherent approach to Systems Architecture is needed...
lo q vimos hasta ahora fue un pantallazo de los numeros de la web hoy en dia, en donde el tráfico es impresionante. Vimos q el hardware ha mejorado y abaratado. Los requerimientos de los usuarios son muy exigentes.
Con las arquitecuras q conocemos hoy, sobre todo las MVC q utilizan un modelo de concurrencia basado en threads con mem compartida, se nos es muy complicado o casi imposible poder hacer frente a estos requerimientos.
Por lo general se utilizan mecanismos muy complicados para poder emparchar esta situación.
Reactive Systems
Traits
reactivemanifesto.org
Organisations working in disparate domains are independently discovering patterns for building software that look the same. These systems are more robust, more resilient, more flexible and better positioned to meet modern demands.
These changes are happening because application requirements have changed dramatically in recent years.
It is time to apply these design principles consciously from the start instead of rediscovering them each time.
Responsive
“A responsive system is quick to react to all users — under blue skies and grey skies — in order to ensure a consistently positive user experience.”
Responsive Systems
offer:
- rapid and consistent response times
- reliable upper bounds to deliver a consistent QoS
should:
- detect problems quickly and dealt with them effectively
Resilient
“The ability of a sustance or object to spring back into shape.”
“The capacity to recover quickly from difficulties.”
Merriam Webster
Merriam–Webster Inc. de Springfield, Massachusetts, es una editorial estadounidense que publica libros de referencia, sobre todo diccionarios, que tienen su origen en el diccionario An American Dictionary of the English Language, de Noah Webster, publicado a su vez en 1828.Resilient
The system should be responsive in the face of failure.
Failure != Error
Examples of failures:
- program defects causing corrupted internal state
- hardware malfunction
- network failures
- troubles with external services
A failure is an unexpected event within a service that prevents it from continuing to function normally. A failure will generally prevent responses to the current, and possibly all following, client requests. This is in contrast with an error, which is an expected and coded-for condition—for example an error discovered during input validation, that will be communicated to the client as part of the normal processing of the message.
Resilient - how?
Failure Recovery in OOP
Resilient - how?
Failure Recovery in OOP
- single thread of control
- if the thread blows up => you are screwed
- no global organization of error handling
- defensive programming tangled with business logic, scattered around the code
Resilience is by design
ok... but how?
Resilience is by design - How?
containment delegation
isolation
let's review this concepts...
Cómo vamos a alcanzar la resiliencia? Contando con compartimientos que contengan a las fallas, y q aislen a los componentes entre si, permitiendo que partes del sistema puedan caerse y recuperarse sin comprometer al sistema entero.
La recuperación de cada componente es delegado a otro componente externo. El cliente no es responsable de manejar las fallas.
La alta disponibilidad es asegurada replicando componentes donde sea necesario.
Resilience is by design - How?
containment:=> bulkheads=> circuit breaker
delegation=> supervision
Resilience is by design - How?
isolation (decoupling, both in time and space)
- in time:
sender and receiver don't need to be present at the same time for communicate
=> message-driven arquitecture
- in space (defined as Location Transparency):
the sender and receiver don't have to run in the same process, and this might change during application's lifetime
=> message-driven arquitecture
Bulkheads
+
Supervison
una vez q sabemos las propiedades q tenemos q tener en el diseño para lograr la resiliencia, vamos a revisar cada una de éstas.Bulkheads help us to:
isolate the failure
compartmentalize
manage failure locally
avoid cascading failures
this concept should be used together with supervison!
Supervison
Core Concept
Supervision means that normal requests and responses (including negative ones such as validation errors) flow separately from failures: while the former are exchanged between the user and the service, the latter travel from the service to its supervisor.
Supervisor hierarchies with Actors
Supervisor hierarchies with Actors
Supervisor hierarchies with Actors
Configuration in Akka
import akka.actor.OneForOneStrategy
import akka.actor.SupervisorStrategy._
import scala.concurrent.duration._
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: IllegalArgumentException => Stop
case _: Exception => Escalate
}
Circuit Breaker
PROBLEM - Example Situation
- web app interacting with a remote WS
- remote WS is overloaded, and its DB takes a long time to respond with a fail
- => WS calls fail after a long period of time
- => web users noticed that form submissions takes time to complete
- => web users start to click the refresh button adding more requests!
- => web app fails due to resource exhaustion, affecting all users across the site
failures in external dependencies shouldn' t bring down an entire app
Circuit Breaker
Solution Proposal
monitor response times
if time consistently rises above a threshold, either:
fail fast approach, or...
route following requests to an alternative service
monitor the original service
if the service is back in shape, restore to the original state
if not, continue with the same approach
Circuit Breaker
Akka Implementation
- implements fail fast approach
- provide stability
- prevent cascading failures in distributed systems
Circuit Breaker
Akka Implementation
configuration
- max. number of failures
-
call timeout: response time threshold
-
reset timeout: we'll see this later
Circuit Breaker
Akka Implementation
Closed state (normal operation)
- Exceptions or calls exceeding the configured callTimeout increment a failure counter
- Successes reset the failure count to 0 (zero)
- When the failure counter reaches a maxFailures count, the breaker is tripped into Open State
Circuit Breaker
Akka Implementation
Open State
- All calls fail-fast with a CircuitBreakerOpenException
- After the configured resetTimeout, the circuit breaker enters a Half-Open State
Circuit Breaker
Akka Implementation
Half-Open State
- The first call attempted is allowed through without failing fast
- All other calls fail-fast with an exception just as in Open state
- If the first call succeeds, the breaker is reset back to Closed state
- If the first call fails, the breaker is tripped again into the Open state for another full resetTimeout
Circuit Breaker
Akka Implementation
import akka.pattern.CircuitBreaker
def dangerousCall: String = "This really isn't that dangerous"
val breaker =
CircuitBreaker(system.scheduler,
maxFailures = 5,
callTimeout = 10.seconds,
resetTimeout = 1.minute)
def dangerous: Future[String] =
breaker.withCircuitBreaker(Future(dangerousCall))
Problem Description
speed(publisher) > speed(suscriber)
when a component is under stress it might fail catastrophically or drop messages in an uncontrolled fashion. To be more resilient, the system as a whole needs to react to avoid this kind of situations
bounded buffer => drop messages + require re-sending
unbounded buffer => buffer overflow
Back Pressure - Solution
speed(publisher) < speed(suscriber)
Back pressure: the component should communicate that is under stress to upstream components, so they can start reducing the load
Reactive Streams
Stream processing on the JVM:
- Asynchronous
- Back-pressured
- Standarized
Available implementations:
- Akka Streams
- Reactor Composable (DSL for groovy, clojure)
- RxJava
- Ratpack
Reactive Streams
Akka Streams Example
implicit val system = ActorSystem("Sys")
val mat = FLowMaterializer(...)
Flow(text.split("" "").toVector).
map(word => word.toUpperCase).
foreach(transformed => println(transformed)).
onComplete(mat) {
case Success(_) => ...; system.shutdown();
case Failure(e) => ...; system.shutdown();
}
Elasticity
Scalability
+
automatic resource management
slidesLanguageService.setLang(Lang.SPANGLISH)
Scalability - Qué significa? - (1)
la performance del sistema
es proporcional
a los recursos reservados
Scalability - Qué significa? - (2)
Scale OUT & IN (horizontal)
+
Scale UP & DOWN (vertical)
Scalability
Por qué es necesario?
- por ejemplo: en el dominio eCommerce:
- los picos de tráfico mas altos se dan cuando cuando estás vendiendo bien
- durante un pico de tráfico la gente basicamente quiere darte su dinero :)
Cómo lo logramos?
- Never Block -> Go Async
- Share Nothing -> Use Immutable Data
- Location Transparency
Share Nothing -> Use Immutable Data
Location Transparency
Scale OUT = Scale UP
- la transparencia de localización permite el desacoplamiento a nivel del espacio
- dos componentes se pueden comunicar si conocer donde estan ubicados.
- permite adaptar la topologia del sistema dependiendo del uso
- es un concepto clave para poder escalar bajo demanda
- como estamos haciendo computación distribuida, no hay diferencia conceptual entre comunicarse con un componente en el mismo nodo o en otro nodo del cluster
la Elasticidad
requiere de una
Message-driven Arquitecture
Message Driven Arquitecture
Message Driven
Por qué es importante?
la comunicación asincrónica basada en pasaje de mensajes otorga/permite:
- bajo acoplamiento
- aislamiento
- location transparency
-
mantenibilidad y flexibilidad para evolucionar - interfaces enfocadas en el contenido de la comunicación
- lower latency & higher throughput
- scale up & out
-
load management and flow control - a través del monitoreo de colas de mensaje y aplicando back-pressure
Qué posibles tecnologías cumplen con todos estos requerimientos?
Feature
Akka
containment (bulkheads)
yes (actors)
asynchronous & non blocking message-passing
yes
fault tolerance
yes (supervision)
back-pressure
yes (by using Akka Streams)
circuit breaker
yes
location transparency
yes (ActorRef concept)