Measure Anything, Measure Everything. – Background – Statsd



Measure Anything, Measure Everything. – Background – Statsd

0 0


application-monitoring

sides using reveal.js with livereload

On Github jkhuraijam / application-monitoring

Measure Anything, Measure Everything.

using statsd + graphite.

Created    by    Erik Dasque  &   Johnson Khuraijam

Introduced to the concept at Monktoberfest Etsy presentation http://codeascraft.com/2011/02/15/measure-anything-measure-everything/" “If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it.”

Background

Measure at 3 levels:

  • Network:

    Servers in DC, nodes in cluster, firewall, etc.

  • System Services:

    DB2, Memcached,Cassandra,etc.

  • Applications:

    ui,contacts2,mylibraryNext,accounts,etc.

OPS normally takes care monitoring:

  • Network.
  • System Services.
  • little or none for applications.

Developers

  • should initiate,
  • own and
  • observe Application level monitoring.

But Why?

Developers

  • have the most knowledge about the application.
  • can prepare for failure.
  • have opportunities to take proactive action.

But Why?

Causality: the relationship between cause & effect

  • Issues are likely to show up in application level monitoring before they escalate and compound other isssues to cause an application or site wide problem
  • Being aware of a timer slow creep over a period of time allows a developer to correct a problem that didn't exist at launch and hasn't really materialized yet at the Network/System/Machine level but will eventually
  • Noticing an anomaly in a counter, timer or gauge (throughput) will often lead to fixing an issue that might only show up in a logs and stay under the radar but degrades user or system experience, compounding other issues

Statsd

  • statsd is a simple NodeJS daemon that listens for messages on a UDP port.
  • It parses messages, extracts metrics data, aggregates and periodically flushes the data to graphite.

Graphite

  • Scalable Realtime Graphing tool extensively used here by OPS and NOC for monitoring.
  • Comprised of network/storage/aggregation/calculations/real time or historical graphs layers

statsd in myLibrary Next.

Beside the business needs, why is myLibraryNext a good candidate

  • One of the new apps in CI/CD model, adopting new changes are easier.
  • LOCAL: ClamAV, anti-virus scanner & NAS (NetApps Filer).
  • INTERNAL: Shared services
    • System: RovingDB,Memcached & Camel.
    • Apps: Productsvc,ComplianceAPI.
  • EXTERNAL: Third party APIs
    • Facebook, Instagram, Aviary & Bigstock.
  • VISIBILITY: Used by 100% of our customers (Trialers & Paid).
  • FEATURE USAGE MERTICS: New features rollout. e.g.Bigstock.

Java statsd client

  • JAVA: Created common statsd client module jar under https://github.roving.com/ES/nodep.
  • This jar also contains the mbean, for enabling and disabling statsd through JMX interface .
  • Ruby,Python,.NET, etc. clients info in Etsy statsd website.

Pom.xml

Spring wiring

Counter Example

Timer Example

Counter In graphite

Timer In graphite

DEMO

Big thanks to

  • LEN SMITH, for quickly getting the statsd servers ready in all environments.
  • SUSHMA YADUPATHI, for testing the enhancements in MyLibraryNext.

 

 

References