On Github gord1anknot / graphite-vs-elasticsearch
Brad Rowe @gord1anknot
The way they model data is extremely different
- Both tools have a query facility that can be optimized for time series data - No offense, but SQL 2011 is waaay too little, waaay too late - When troubleshooting a production issue at 2:00 AM, don't make me think about how to do a query. Using both keeps easy things easy.carbon : a server listening for incoming stats on TCP port 2003 and responding to queries on TCP port 7002
whisper : a time series database management system
graphite-web : a django web app that renders graphs using cairo
Stats are represented in three parts:
name value timestamp name value timestamp name value timestamp name value timestamp ...
A name is a string with some periods in it, like:
some.text.with.periods.in.it
Each period will represent a directory on the filesystem, each with a subdirectory, ending in the location of the whisper database archive for the stat. Stated differently, carbon will create a tree of directories, containing subdirectories, until a stat name has no more periods.
Many query functions allow wildcards like this:
averageSeries(my.company.server.*.requests.500)
With the wildcard syntax, graphite will look in all subdirectories of my/company/server for any stat in a subdirectory at requests/500. Graphite will combine all the values and, for this function, anyway, average them.
float('nan') float('-inf') float('+inf')
These python types represent floating point numbers, ultimately platform specific implementations of IEEE 754 numbers. Although they can be stored in whisper archives, tools (graphite-web in particular) will coerce them to None.
None
None, the python null, represents the absence of a measurement at a given timestamp.
Graphite timestamps are floats that represent milliseconds since the UNIX epoch
- Timestamps are always aggregated and bucketed before storage - which brings us to the whisper backend...java.lang.String document;
That's it. No, really.
Other features around inserting documents exist only to organize those documents, or make queries easier.GitHub's free-form search is powered by ES
Top level container, indexes are not related to each other in any particular way unless a query specifically indicates otherwise
When using logstash, an index represents a period of time (an hour of a day, or a day); however, nothing enforces this other than the name of the index
Mappings define the transformation from _source, the raw document, to an indexed version with metadata, having fields, which could have more specific types than String or relationships to each other that are resolved during a query
Anything goes, but JSON is easiest to manage. Logstash documents look like this:
{ '@timestamp': new Date(), '@version': 1, 'message': "this text was input to logstash, and logstash wasn't configured to do any processing on it. Logstash will simply pass it through to elasticsearch in a JSON property called message." }