graphite-vs-elasticsearch



graphite-vs-elasticsearch

0 1


graphite-vs-elasticsearch

slides from a presentation comparing two self-hosted metrics systems, graphite and elasticsearch / logstash / kibana (ELK)

On Github gord1anknot / graphite-vs-elasticsearch

Graphite Vs Elasticsearch

Brad Rowe @gord1anknot

Graphite and Elasticsearch

  • Open Source
  • Come with tooling for use monitoring server operations (batteries included)

Why use both?

The way they model data is extremely different

- Both tools have a query facility that can be optimized for time series data - No offense, but SQL 2011 is waaay too little, waaay too late - When troubleshooting a production issue at 2:00 AM, don't make me think about how to do a query. Using both keeps easy things easy.

Graphite

The Three Parts

carbon : a server listening for incoming stats on TCP port 2003 and responding to queries on TCP port 7002

whisper : a time series database management system

graphite-web : a django web app that renders graphs using cairo

Anatomy of a Stat

Stats are represented in three parts:

name value timestamp
name value timestamp
name value timestamp
name value timestamp
...

Stats - Names

A name is a string with some periods in it, like:

some.text.with.periods.in.it

Each period will represent a directory on the filesystem, each with a subdirectory, ending in the location of the whisper database archive for the stat. Stated differently, carbon will create a tree of directories, containing subdirectories, until a stat name has no more periods.

Stats - What's in a Name?

Many query functions allow wildcards like this:

averageSeries(my.company.server.*.requests.500)

With the wildcard syntax, graphite will look in all subdirectories of my/company/server for any stat in a subdirectory at requests/500. Graphite will combine all the values and, for this function, anyway, average them.

Stats - Values

    float('nan')
    float('-inf')
    float('+inf')

These python types represent floating point numbers, ultimately platform specific implementations of IEEE 754 numbers. Although they can be stored in whisper archives, tools (graphite-web in particular) will coerce them to None.

    None

None, the python null, represents the absence of a measurement at a given timestamp.

Stats - Timestamps

Graphite timestamps are floats that represent milliseconds since the UNIX epoch

- Timestamps are always aggregated and bucketed before storage - which brings us to the whisper backend...

Whisper

  • Whisper is essentially directories and files on a disk and some python scripts
  • Whisper requires size and resolution of database files to be specified in advance
  • Schemas are immutable, but Whisper can reindex from one archive into a new schema
  • Reindexing or even just renaming all the time is not at all practical (source: my personal experience)
> Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data. > ... an archive with 1 data point every 60 seconds can have a lower-resolution archive following it with a resolution of 1 data point every 300 seconds because 60 cleanly divides 300. In contrast, a 180 second precision (3 minutes) could not be followed by a 600 second precision (10 minutes) because the ratio of points to be propagated from the first archive to the next would be 3 1/3 and Whisper will not do partial point interpolation. > Whisper databases contain one or more archives, each with a specific data resolution and retention (defined in number of points or max timestamp age). Archives are ordered from the highest-resolution and shortest retention archive to the lowest-resolution and longest retention period archive. Semantics of Null: - Each data point is stored with its timestamp - The timestamps are used during data retrieval to check the validity of the data point - All time-slots within an archive take up space whether or not a value is stored

Whisper, pt. 2

  • All time-slots within an archive take up space whether or not a value is stored
  • Each whisper archive has a header section containing schema that describes how downsampling will occur between time resolutions
  • Multiple archives in a schema can't be of arbitrary resolution - they must be even multiples
  • In a whisper archive, the absence of a data point has semantic meaning
> Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data. > ... an archive with 1 data point every 60 seconds can have a lower-resolution archive following it with a resolution of 1 data point every 300 seconds because 60 cleanly divides 300. In contrast, a 180 second precision (3 minutes) could not be followed by a 600 second precision (10 minutes) because the ratio of points to be propagated from the first archive to the next would be 3 1/3 and Whisper will not do partial point interpolation. > Whisper databases contain one or more archives, each with a specific data resolution and retention (defined in number of points or max timestamp age). Archives are ordered from the highest-resolution and shortest retention archive to the lowest-resolution and longest retention period archive. Semantics of Null: - Each data point is stored with its timestamp - The timestamps are used during data retrieval to check the validity of the data point - All time-slots within an archive take up space whether or not a value is stored

Graphite's Model Works If:

  • Numbers model your domain well
  • You require a large number of unique, independently distributed stats at discrete, bucketed time intervals
  • The stats are conceptually hierarchical: they have either parent-child or sibling relationships with each other

Graphite Could be Problematic If:

  • Your domain is best modeled categorically, not numerically
  • The stats have many to many relationships to each other
  • The number of categories, and relationships among categories, can't be determined in advance
- SubPub principal IDs and message types are like this - Each subscription is owned by one principal, subscribing to one message type - A Message Type may or may not be published, and if it does, it may be published by any principal - Some principals are related to each other, and others are not - If I reported all stats to graphite, how could I query for delivery failures for all GRID message types?

Elasticsearch

Elasticsearch's Data Model

java.lang.String document;

That's it. No, really.

Other features around inserting documents exist only to organize those documents, or make queries easier.

You've Already Used Elasticsearch

GitHub's free-form search is powered by ES

Organizing ES Data

ES entity what info in advance? indexes # of shards (default is configurable) types mapping (optional) documents doc id (optional), id of parent doc (optional), ttl (optional) fields mappings define these

ES Indexes

Top level container, indexes are not related to each other in any particular way unless a query specifically indicates otherwise

When using logstash, an index represents a period of time (an hour of a day, or a day); however, nothing enforces this other than the name of the index

ES Mappings and Types

Mappings define the transformation from _source, the raw document, to an indexed version with metadata, having fields, which could have more specific types than String or relationships to each other that are resolved during a query

ES Documents

Anything goes, but JSON is easiest to manage. Logstash documents look like this:

{
    '@timestamp': new Date(),
    '@version': 1,
    'message': "this text was input to logstash,
                and logstash wasn't configured to do any 
                processing on it. Logstash will simply pass it 
                through to elasticsearch in a JSON property 
                called message."
}

Elasticsearch's Model Works If:

  • You don't have a way of knowing about document syntax and semantics in advance
  • You're comfortable with spending time figuring out how to query it later on
  • Your primary interest during queries have to do with the contents of documents, as opposed to their relationships with each other
    • Aggregations API support new in Kibana 4.0 makes that last point less of an issue

Elasticsearch Could be Problematic If:

  • You find yourself doing a lot of unrelated, and fairly unique, ad-hoc queries
  • Higher order relationships among documents are more important than the documents