summary – architecture – Load Balancers



summary – architecture – Load Balancers

1 0


mozilla-scaling-patterns

Simple Patterns for Scaling Websites: Some lessons learned at Mozilla

On Github cturra / mozilla-scaling-patterns

Scaling Websites

Some Lessons Learned at

Who are you guys?

we both work in web operations @ mozilla

Not just firefox?

  • mozilla.org (Firefox downloads)
  • input.mozilla.org (happy/sad face)
  • crash-stats.mozilla.org (crash reporting)
  • support.mozilla.org (user support community)
  • ... and hundreds more

summary

architecture load balancers databases async jobs caching self service paas cloud

architecture

clusters admin node web nodes databases nodes some clusters shared

Load Balancers

Zeus (now Stingray) software solution

Platform

  • RHEL6
  • HP DL360
  • Myricom Myri-10G

Details

in front of nearly everything

  • apache, mysql, elasticsearch, hadoop
  • 200k packets per second for SSL

Databases

/dev/null is web scale.

MySQL Multi-Master

Only ONE is active for writes in the Load Balancer.

Read Slaves

Write to master, but only read from slaves.

 DATABASES = {
    ...

    'slave': {
       'ENGINE': 'django.db.backends.mysql',
       'NAME': 'mozillians_org',
       'USER': 'mozillians',
       'PASSWORD': 'YoUt#!nk+h1$izR3@l?',
       'HOST': 'generic-ro-zeus',
       'PORT': '3306',
    },
 }

 SLAVE_DATABASES = ['slave']
						

Hardware

No virtualization in production.

  • HP blades
  • Fusion-IO
  • HP and Kingston SSDs

DBA's

AWESOME DBA's are AWESOME! +query optimation like code reviews.

A'SYNC Jobs

webscale boy band

Celery

  • don't block the web app
  • written in python & we use django
  • supervisord for celeryd

Rabbit MQ

  • message queue between web app & celery service
  • cluster per datacenter
  • puppet module to horizontally scale

Cache

rules everything around me

Memcache

  • we use the vanilla memached

memcache::data

  • ephemeral data (sessions/rss feeds/etc)
  • short lived and can be lost without impact

memcache::databases

  • django-cachemachine
  • object manager, looks in cache first for data

Local HTTP caching

We use Zeus You can also use: Varnish, Squid

Global HTTP Caching: CDN

  • ~450 million Firefox users (6 wk updates)
  • vendors: Akamai/EdgeCast (65%/35%)
  • balance traffic with DynECT base on response

Akamai::FF18 HPS

Jan 10, 2013 -> Jan 13, 2013 inclusive.

  • Total hits: 5.5 billion
  • Peak HPS: 58,379.7 hits/sec

Akamai::FF18 Bandwidth

Jan 10, 2013 -> Jan 13, 2013 inclusive.

  • Total volume: 2.1PB
  • Peak traffic: 163.177 GBit/sec

Scale Out

or you fail out

Config Management

We chose Puppet, but there are other great options like: Chef & CFEngine

Disposable Web Heads

  • nothing is shared
  • Seamicro Xeon
  • common files (uploads/css/js) in NetApp NFS
  • S3 to replace NFS for upload storage (amo/marketplace)

AMD Seamicro

  • deployed for increase compute efficiency
  • saves up to 75% in space/power
  • enables 192 vs. 64 hosts per 45U rack

The Future

where we're going, we don't need roads

DevOps culture

  • blameless postmortems
  • all invested in the same mission
  • continuous improvement (always try to make the process better)
  • hire the best f$*!ing people

Self Service::Goal

to become platform engineers!

Self Service::Continuous Deployment

  • django-waffle
  • dark launching / feature flags
  • sumo, amo, input, mdn
  • if flag_is_active, checks, cookies, superuser, group, "dice roll"

Self Service::Chief

  • 90% of site pushes to prod by end of 2013Q1

Self Service::Jenkins

  • socorro - tarballs
  • stage autodeploys

Self Service::Graphite

everyone has access to the graphs, real time.

Self Service::Logstash

With Kibana, everyone has access to the logs. yup, real time.

Self Service::Sentry

everyone has access to exception tracking.you guessed it, real time!

PaaS

  • we chose Stackato by ActiveState (built on CloudFoundry)
  • evaluated CloudFoundry, OpenShift & various hosted
  • chose most product focused

Cloud

  • dynamically scale in cloud, base footprint in datacenter
  • PaaS -> add DEA instances for scaling extra capacity.

keep on rockin'the free web