Test Your Upgrade – with Terraform and Ansible



Test Your Upgrade – with Terraform and Ansible

0 0


techtalk-splunk-test-aws


On Github sworisbreathing / techtalk-splunk-test-aws

Test Your Upgrade

with Terraform and Ansible

Slides by Steven Swor

The Challenge

We want to upgrade Splunk

The Problem

We don't have a test environment

The Solution

Let's Build a Test Environment!

... in AWS

... with Terraform

... and Ansible

But First...

... A Brief Introduction

$whoami

Steven Swor

sworisbreathing (GitHub, StackExchange)

steven_swor (Splunk Answers)

(TEL-struh)

http://telstra.com

Telstra's Purpose

To create a brilliant connected future for everyone

What Telstra Does

Telecommunications

Media

Lots of Other Stuff

What I Do

Software Engineering

Functional and Performance Testing

Application Peformance Management

Write Jittery Presentations

What Don't I Do?

Manage Infrastructure

(...except when I have to)

The App

(a.k.a. "what are we upgrading?")

(spluhnk)

http://splunk.com

What Splunk Does

Big Data

Monitoring/Alerting

Analytics

How Splunk Works

Simplified Version

Forwarders

Collect data (logs) from remote systems

Send collected data to indexers

Indexers

Store collected data in indexes

Fetch data from indexes (e.g. run search jobs)

Search Heads

User Interface (Web)

Create/Dispatch Search Jobs

Splunk Apps and AddOns

Really just configuration bundles

Can also include UI components (dashboards)

Deployment Server

Deploys apps/addons to Splunk instances

(similar to Puppet/Chef, but Splunk-specific)

Upgrade Paths

Two Major Changes

Upgrade Splunk across all environments

Improve Splunk app deployment process

Why Upgrade Splunk?

Latest release: 6.2.3

Currently deployed: 5.0.2

5.x release is more than 24 months old

Will reach End-Of-Life when 7.x is released

Creating A Test Environment

AWS (Simplified Version)

VMs provisioned "in the cloud"

Pay only for what you use

Well-documented APIs for automation

Terraform

Automates the nuts and bolts

Create VMs

Manage DNS and Networking

Terraform Config Example

resource "aws_instance" "my_server_name" {
  instance_type = "t1.micro"
  ami = "${lookup(var.aws_amis, var.aws_region)}"
  key_name = "${var.TF_VAR_key_name}"

  connection {
    user = "ubuntu"
    key_file = "${var.TF_VAR_key_path}"
  }

  security_groups = [ "SSH" ]

  tags = {
    Name = "my_server_name"
  }
}

Demo: Terraform Apply

Demo: Terraform Apply

Automates the OS bits

Install software

Start/stop services

Orchestrate across multiple hosts

Ansible Example

---
- hosts: splunk_forwarders
  roles:
    - splunk_forwarder

- hosts: splunk_servers
  roles:
    - splunk_server
    - splunk_deployment_server

Ansible Inventory

Tells Ansible what hosts have what role

Static => Flat File, read when Ansible runs

Dynamic => Script, executed when Ansible runs

Static Inventory Example

[splunk_forwarders]
tldhybqat01vth ansible_ssh_user=ubuntu ansible_ssh_host=54.253.22.104
...

[splunk_forwarders:vars]
splunk_forwarder_deployment_server_host=10.248.16.108
splunk_forwarder_indexer_host=10.248.16.108

[splunk_servers]
tlpinfmgt03vth ansible_ssh_user=ubuntu ansible_ssh_host=54.206.204.196

Terraform + Ansible

Terraform

Creates environment, gets IP addresses

Generates "static" Ansible inventory

(using a Terraform template)

Destroys environment when we are done

Demo: Terraform State -> Ansible Inventory

Ansible

Reads "static" inventory

Creates local service accounts

Installs/updates software

Ensures daemon services are running

Demo: Initial State

Demo: Initial State

Upgrade Testing

Everything in version control (git)

Feature branches for upgrade paths

To test an upgrade:

$ git checkout <branch_name>
$ ansible-playbook ...

Demo: Upgrade Splunk Forwarders

Demo: New Deployment Strategy

Demo: Infrastructure Destroyed

So... How Did It Go?

The Good

Quick turnaround for testing of changes

Time to create a new test environment from scratch: Less than 5 minutes

Environments created and destroyed in a two-week period: approx. 100

Problems Identified During Testing

Shell scripts for data collection (Unix TA) lost their execute permissions

(refactoring was performed on Windows)

Fixed before it went into Production

(yaaay testing!)

The Bad

Splunk Upgrades on RHEL/CentOS

Would not overwrite existing files/folders

Manual intervention required for these hosts

Upgrade of Splunk was only tested on Ubuntu

(I forgot they were using multiple distros)

Alerts Lost

Version Control Repo not up-to-date

Restored from nightly backup

(and version control updated)

(yaaay backups!)

Conclusions

Cloud is well suited for this use case

Tooling is easy to learn

(I can barely spell AWS)

Bottom Line

There's no excuse for having to wait for a test environment anymore

Acknowledgements

Reveal.JS

Makes writing presentations easy

http://lab.hakim.se/reveal-js/

https://github.com/hakimel/reveal.js

D3

Makes transforming HTML easy

http://d3js.org

https://github.com/mbostock/d3

Jitter Animation

Used with permission from Asymmetric Publications

http://asymmetric.net/

http://www.kingdomofloathing.com

Thank You

Test Your Upgrade with Terraform and Ansible Slides by Steven Swor