On Github skuenzli / building-reliable-websites
breaking systems for fun and profit since 2000
What percentage of your customers do you care about?
50%
95%
99%
?
In reality, a page requires multiple requests, so we have a conditional probability to account for. Good designs will attempt to make services as independent and parallelized as possible.# total number of GETs to /myservice for a given day grep -c 'GET /myservice' logs/app*/access.log.2012-11-16 # estimate peak hour for service from sample grep 'GET /myservice' logs/app??5/access.log.2012-11-16 | \ perl -nle 'print m|/201\d:(\d\d):|' | sort -n | uniq -c # total number of GETs to /myservice at peak hour grep -c '2012-11-16 17:.*GET /myservice' \ logs/app*/access.log.2012-11-16
# processing times recorded by server in access log grep "GET /myservice" logs/app*/access.log.2012-11-16 | \ cut -d\" -f7 | sort -n > service.access_times.2012-11-16
what about network latency and bandwidth?
does request fit in the client's resource budget? 50/95/99%?
all models are wrong; some models are useful
model +/- 20%
count compute judgeadjust for
growth seasonal loading margin of errorGatling is an Open Source Stress Tool with:
gatling-tool.org
detect changes in trend with control charts
is process changing?
is process changing?
the needs of the many outweigh the needs of the few or the one
a dead site is no good to anyone
know the site's limits and stay within them
implement a series of circuit breakers that can be tripped to reduce load in a managed way
Example services ranked by criticality
send marketing email off-line / off-hours update customer's dashboard breaker #1 upload images breaker #2 render images breaker #3 sign-in checkout save customer's workthere's usually a trade-off available
Of course, the definition of critical and non-critical totally depends on the business. However, if you try hard enough, you can always rank services in order of criticality.Resources