Intern Tech Talk

Perry Huang

@perry_huang

Summer 2013

Summer Overview

First week: Got onboard
Weeks 2-3: Project strace-explain
Week 4: Learned about internals of Disqus, Django, Memcached, Cassandra, Redis, and more
Week 5-6: Analyzed Memcached Usage at Disqus
Weeks 7-12: Worked on Disqus and other things

First week: Got onboard

Onboarding was difficult, but I survived. As the only intern, I knew I had huge responsibilities.

I hardly knew anything about Disqus, but got a lot of help from the team.

"What's Promoted Discovery, Storm, Sentry, Jones, Gargoyle, and etc.?"

I wanted to fix Disqus. I was learning throughout the summer and made more of an impact each week.

Weeks 2-3: strace-explain

Worked on strace-explain, a Ruby gem that can be used to trace and analyze system calls for any user-level process. It reveals time being spent waiting for networked I/O from different resources. This can be used in debugging the Django dev server.

Source:

http://github.com/disqus/strace-explain/

Install:

$ gem install strace-explain

Disquss-MacBook-Pro:~ $ strace-explain -h
Usage: strace-explain -p [PID]
   or: strace-explain [command]

Options:
    -p, --pid N                      Attach to process PID N.
    -t, --time N                     Run analysis for N seconds.
    -h, --help                       Show this message
    -v, --version                    Show version

Weeks 2-3: strace-explain

http://strace-explain.herokuapp.com/analysis/53558744580160

Week 4 and beyond: Learning

Learned about the internals of various tools that we use at Disqus, as well as Disqus itself:

Memcached and alternatives (e.g. mattproxy, mattdb, twitter-memcached, fatcache)
Redis, Cassandra, PgSQL
Indoor golf
Disqus, Django, and of course, Python

Disqus Ops Team

Meet the Disqus Ops Team

Harder - Matt
Better - John
Faster - seems to be me
Stronger - Mike

Week 5-6: Analyzed Memcached Usage at Disqus

Created a tool to analyze Memcached traffic from real-time or previously recorded TCP/UDP dumps.

Source:

github link to memcached analysis tools

Results:

https://gist.github.com/perryh/07ef6828a9351f2604eb

At Disqus, most of our cache keys start with a group name, so we can adjust our code to optimize cache usage. Here are results from one group:

Type: :8:NewPostTreePaginator
    145810 gets = 9767 hits + 136043 misses
    153451 sets
    (1.2299291936654724% of total gets)
    (93.30155682051986% cache miss rate)
    (5.8092943561057195% of total sets)
    Min. payload length: 29
        25th percentile: 29
        50th percentile: 2095
        75th percentile: 8518
    Max. payload length: 273910

Lessons Learned Regarding Caching

"Don't be retarded." ~ Ben

Lesson 1

Use the cache properly. Typically, we would first check our cache for our values before we query the database. If we do end up query the database, we would need to set the values into cache for them to be retrieved later. I found a few uses of the cache this summer where we did not actually set the keys we were querying.

Lessons Learned Regarding Caching

Lesson 2

Avoid caching keys that have high cache miss rates or are never called within their lifetime. Large keys with this property are especially bad.

Case study from our Paginator:

Type: :8:NewPostTreePaginator
    145810 gets = 9767 hits + 136043 misses
    153451 sets
    (1.2299291936654724% of total gets)
    (93.30155682051986% cache miss rate)
    (5.8092943561057195% of total sets)
    Min. payload length: 29
        25th percentile: 29
        50th percentile: 2095
        75th percentile: 8518
    Max. payload length: 273910

These keys can cause Memcached to evict other keys that are called more often within their lifetime. They are a waste of memory.

Lessons Learned Regarding Caching

Lesson 3

Use cache.get_many and cache.set_many instead of looping cache.get and cache.set. This will dramatically decrease cache retrieval and setting from anywhere between 200 to 600 ms to a mere 60 ms on average.

Avoid this:

for o in objects:
    result[o.pk] = cache.get(make_key(o.pk))
# Build list of cache miss keys, query them from database
...
for missing in missing_objects:
    cache.set(make_key(missing.pk))

Do this:

cache_keys = [make_key(o.pk) for o in objects]
results = cache.get_many(cache_keys)
# Build list of cache miss keys, query them from database,
# and build a dict of cache miss keys and values
...
cache.set_many(missed_key_values)

Weeks 7-End: Speeding up Disqus

Caching our API responses

Using Varnish and the HTTP cache-control header helped decrease response time. While browsing New Relic, I noticed that some of our API endpoints were responding very slowly. New Relic reported that our 'Community' tab on large sites, such as CNN, would take upwards of 9 seconds to respond. This was confirmed by my own experience after experimenting on CNN, IGN, and The Atlantic.

With help from Matt, I got these API responses to respond in ~100 ms through the use of Varnish. This leads to a great improvement in user experience.

Weeks 7-End: Speeding up Disqus

Removing keys that fill up cache

We used to have a key group that was cached in our cluster of Memcached servers. This proved to be really bad to cache:

Type: :8:html
    1673394 gets = 624777 hits + 1048617 misses
    1292953 sets
    (14.11532907965599% of total gets)
    (62.66408269660343% cache miss rate)
    (48.94816303321555% of total sets)
    Min. payload length: 32
        25th percentile: 70
        50th percentile: 122
        75th percentile: 255
    Max. payload length: 29261

While a 62% miss rate is not the worst in the world, this key group did account for 48% of our cache sets. Removing this from our main Memcached cluster, along with some of my other fixes, allowed us to increase are cache hit ratio from 80% to over 90%. We now have this key group in its own cache server being handled by the LRU eviction policy, where it cannot affect other key groups.

Weeks 7-End: Things that did not work

Rewrite our Paginator in C++

I spent a lot of time learning to use SciPy Weave and Python Boost, but found the idea of rewriting the Paginator in C++ impractical. We have other plans in the pipeline to improve the Paginator.

tl;dr

learned a lot this summer
made performance upgrades to Disqus
made open source contributions

Thanks

everyone in operations team for helping me throughout the summer
everyone for being awesome
Mike for being my mentor and organizing an memorable sailing trip
Jason and Daniel for forever branding me with my Disqus tattoo

disqus-tech-talk

perryh

disqus-tech-talk

0 0

disqus-tech-talk

Intern Tech Talk

Summer Overview

First week: Got onboard

Weeks 2-3: strace-explain

Weeks 2-3: strace-explain

Week 4 and beyond: Learning

Disqus Ops Team

Meet the Disqus Ops Team

Week 5-6: Analyzed Memcached Usage at Disqus

Lessons Learned Regarding Caching

Lesson 1

Lessons Learned Regarding Caching

Lesson 2

Lessons Learned Regarding Caching

Lesson 3

Weeks 7-End: Speeding up Disqus

Caching our API responses

Weeks 7-End: Speeding up Disqus

Removing keys that fill up cache

Weeks 7-End: Things that did not work

Rewrite our Paginator in C++

tl;dr

Thanks

disqus-tech-talk

perryh

disqus-tech-talk

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

disqus-tech-talk

Intern Tech Talk

Summer Overview

First week: Got onboard

Weeks 2-3: strace-explain

Weeks 2-3: strace-explain

Week 4 and beyond: Learning

Disqus Ops Team

Meet the Disqus Ops Team

Week 5-6: Analyzed Memcached Usage at Disqus

Lessons Learned Regarding Caching

Lesson 1

Lessons Learned Regarding Caching

Lesson 2

Lessons Learned Regarding Caching

Lesson 3

Weeks 7-End: Speeding up Disqus

Caching our API responses

Weeks 7-End: Speeding up Disqus

Removing keys that fill up cache

Weeks 7-End: Things that did not work

Rewrite our Paginator in C++

tl;dr

Thanks

0 0