Clusterf*ck – The Clustering Clinc – Arriving at AHA MOMENT #1



Clusterf*ck – The Clustering Clinc – Arriving at AHA MOMENT #1

0 1


cclinic

Clustering Clinic

On Github dtrapezoid / cclinic

Clusterf*ck

The Clustering Clinc

Presented by Dani Traphagen / @dtrapezoid Michael Perrone & Sebastian Estevez / @syllogistic

Introduction

There were some serious issues with this cluster of pain. We rolled up our sleeves and got down to bidness, trouble shooting this hot mess.

Oh hey, these are some notes. They'll be hidden in your presentation, but you can see them if you open the speaker notes window (hit 's' on your keyboard).

Arriving at AHA MOMENT #1

We 1st Viewed Opscenter and the cluster topology next we...

Ran Jmeter

Just to check we could fire it up.

Cassandra.yaml

Seed nodes weren't set the same, but that wasn't it, because that only matters during bootstrapping.

When in doubt, CHECK ALL THE FILES!

Checked Cassandra log and ran Cassandra in the foreground. AHA HUGEFILE is HUGE!

When we got the down node up:

Datacenter 2 only showed 6 of 8 agents so checked to see datastax agent was running (restarted in case it wasn't running). http://slid.es.

Check moar things!

  • Check DataStax agent log
  • Nodetool Status
  • Agent Config
  • Cassandra.yaml

Arriving at AHA MOMENT #1

  • The snitch is a lie
  • Was set as Ec2Snitch
  • Should have been Ec2MultiRegion

Our final solo AHA Moment #3

  • Heap size is a lie
  • All the heaps were commented out
  • Except for 2 which were different values