MonitoringWithRiemann



MonitoringWithRiemann

0 0


MonitoringWithRiemann


On Github bear / MonitoringWithRiemann

Monitoring (security)

with Riemann

Bear

Operations at &yet

@codebear

What is Riemann and

why would I ever think

of using it for monitoring?

Buzzword Bingo?

Riemann is an event stream processor

Riemann is

  • a very low-latency
  • event aggregation tool
  • coupled with a powerful processing language

A Riemann Stream

Streams are functions

that act on events

is it ready to use?

Ive heard quite a lot about it last year so I wanted to see if it could be a practical part of our toolset

Yes

What can it do?

  • Monitor change to the rate of an event
  • Spot peaks
  • Spot missing data
  • Identify when services are overloaded

Monitor change

For every server you monitor

you can find those that are

receiving more than the usual attention

Spot peaks

You can trigger an alert

when a threshold has been reached

Spot missing data

Often the precursor to an outage

is marked by a service hanging

Identify overloaded services

Knowing when the total load

of a service has exceeded

a threshold

across all hosts

Okay, let's see some specifics

Riemann's Config is Clojure

; -*- mode: clojure; -*-
; vim: filetype=clojure

(logging/init {:file "riemann.log"})

; Listen over TCP (5555), 
; UDP (5555), and websockets (5556)
(let [host "192.168.3.27"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))
start of the config file
(periodically-expire 5 
   {:keep-keys [:host :service :tags]})

(defn log-info [event]
  (info event))
(def email
  (mailer {:from "opsbot@example.com"}))
defining helper routines
(def alert-ops
   (email "opsalert@example.com")
   (fn [event] (info "alert ops" event))
)
(def slack-credentials {
  :account "", :token ""})
(def chat-ops
  (slack slack-credentials {
    :username "opsbot"
    :channel "#chatops"
    :icon ":smile:"}
))
(def tell-ops (rollup 5 3600 chat-ops))
(streams
  (where (state "error") tell-ops)
  (tagged "exception" tell-ops))
(let [index (index)]
  (streams
    (default :ttl 60
      ; index immediately
      index
      (where (not (expired? event))
        (changed-state {:init "ok"}
          (stable 60 :state
            alert-ops
          )))
      (expired alert-ops)
    )))
alert when a service stops talk about changed-state and stable

Monitor Nginx Status

Using riemann-nginx-status

from riemann-tools, you can track

metrics exposed by /nginx_status

{ :host web.example.com,
  :service nginx active,
  :state ok,
  :description nil,
  :metric 3,
  :tags nil,
  :time 1421514112,
  :ttl 10.0
}
Event sent from riemann-nginx-status
(where service "nginx"
      (fixed-time-window 60
        (smap (fn [events] (let [fraction 
            (/ (count (filter #(= "exception" 
                                  (:state %))
                                   events))
               (count events))]
            {:service "auth failures"
             :metric  fraction
             :state   (condp < fraction 
                       0.7 "exception"
                       0.3 "warning" "ok")}))
          (changed-state alert-ops))))
depends on the logstash filtering web events and adding tags for 4XX errors exception == 403 warning == password fails

Combining it all

Useful links

https://github.com/bear/MonitoringWithRiemann

https://riemann.io/quickstart.html

https://rrees.me/2013/04/20/getting-started-with-riemann-stream-processing/

https://github.com/jamtur01/riemann.config

Riemann illustrations are from riemann.io
---