Once upon a time – a Redis::TimeoutError – Redis as much as we cache can



Once upon a time – a Redis::TimeoutError – Redis as much as we cache can

0 1


nosql-redis-slides


On Github jp / nosql-redis-slides

Once upon a time

a Redis::TimeoutError

by @julienpellet

Redis as much as we cache can

  • redis for caching processed information
  • redis as rails cache for views and partials
  • redis for background workers
Oh hey, these are some notes. They'll be hidden in your presentation, but you can see them if you open the speaker notes window (hit 's' on your keyboard).

Few figures from our redis server

  • 19 billion calls in the last 200 days
  • that's an average at 1070 calls per second
  • up to 4GB in memory
  • 1.5 million keys actually used for availabilities

RTFM

The first round of timeout errors

The redis doc

The use of Redis persistence with EC2 EBS volumes is discouraged since EBS performance is usually poor. Use ephemeral storage to persist and then move your persistence files to EBS when possible.

The redis doc - 2

Even if you have persistence disabled, Redis will need to perform RDB saves if you use replication.

Removing redis from EBS

Second round of timeout errors

Degradation in performances

Split rails cache

Rails cache has his own server now.

# cat config/initializers/redis.rb
$redis = Redis.new host: 'ec2-xxx...', port: 6379, driver: :hiredis
$redis_rails_cache = Redis.new host: 'ec2-yyy...', port: 6379, driver: :hiredis

App response time over 3 days.

Response time and throughput over 3 days.

CPU load over 3 days.

Few weeks later

the return of the Redis::TimeoutError

Rails cache server CPU is overloaded

The MONITOR command in redis is showing very slow requests, expiring keys with a wildcard.

The redis doc - 3

KEYS (...) should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases.

The fix

Stopping using the wildcard expiration, we now increment the namespace of the cached value.

Data will expire 14 days after creation.

Error rate and CPU load around the fix.

The rails cache server after the storm.

The rails cache server after the storm.