Don't – manually –



Don't – manually –

1 0


openstacksummit2014-paris-openstack-ha


On Github fghaas / openstacksummit2014-paris-openstack-ha

The year was

2012

... early 2012, to be exact.

The world was a

different

place

... for a variety of reasons.

iPhones

had a

majority market share

Eucalyptus

was still

a thing

Ceph

was

new

I

was in

San Francisco

... and I was

confused.

I found myself

arguing

over whether

OpenStack

needed

High Availability

and the

opposing argument

was

"It's hard."

But thankfully...

... things have changed for the better.

Fast-forward to

today

... and let's ask:

Are We There Yet?

What do we need

for highly available OpenStack clouds?

Let's start

simple

The absolutely simplest OpenStack deployment is one where you install all the services onto one node. But that's really only relevant in lab setups, so let's define our minimum viable OpenStack deployment thus:

- A single API node, this is where all your client-side API requests come in. Possibly via the dashboard, possibly directly from a client. - A single controller node, - a handful of compute nodes, - a network node to enable outside network access to and from your cloud. Needless to say, such a configuration is chock-full of single points of failure, and bottlenecks.

Now we want to

eliminate

SPOFs and bottlenecks.

Ideally, this would lead us to:

- Highly available, load-balanced API nodes, eliminating both a bottleneck and a SPOF. - Highly available controller nodes, eliminating a SPOF. - Highly available network nodes, eliminating a bottleneck and a SPOF. - As many compute nodes as we want.

API nodes

For API nodes, we want to be able to deploy as many as needed, preferably load-balanced.

Controller nodes

For some OpenStack services, we can only ever have one, or at least only one that is actually hit by client requests at any given time.

Compute nodes

Network nodes

these are the most tricky ones, because active/passive HA almost never cuts it, and active/active has been tricky, up to and including Icehouse.

Infra nodes

... and then, importantly, we have a couple of services that need HA and a certain amount of scalability, even though they're only used by OpenStack, rather than being a part of OpenStack itself. Classic examples would be our MySQL database and our RabbitMQ nodes.

Conventions

and

best practices

for OpenStack HA

Infrastructure

RDBMS High Availability

MySQL/Galera

SUSE is a notable exception. Their SUSE Cloud product uses PostgreSQL as the database, relying on DRBD for replication.

RabbitMQ High Availability

There are multiple ways to do this, including mirrored queues or HA queues. Generally though, OpenStack services resend messages that get lost, so as long as you're not bottlenecking on your RabbitMQ throughput it's fine to just keep several instances alive, and use a cluster manager like Pacemaker to move a VIP around.

API Endpoint Load Balancing

HAProxy

Again, there's more than one way to do this. Possible alternatives are DNS round-robin load balancing, or using ldirectord. Most vendor-integrated solutions use HAProxy though.

HA Service Management

Corosync/Pacemaker

HA Cluster Storage

Ceph

Deployment automation

Vendor support

for OpenStack HA

Mirantis

Fuel

Source: Mirantis Fuel 5.1 reference architecture

Ubuntu

Juju/MAAS

Source: Ubuntu OpenStack HA architecture

SUSE Cloud

Crowbar

Source: SUSE Cloud 4 reference architecture

Red Hat

RHEL OSP Installer (Staypuft)

Source: Red Hat RHEL OSP reference architecture

Don't

attempt to do this

manually

crm_mon during failover (Red Hat)

crm_mon during failover (SUSE)

Open issues

We still had a few issues related to HA in Icehouse. Some of these are fixed in the pending Juno release.

HA for

Neutron L3 agents

The Neutron L3 agent has been an issue for some time.

Active/Passive HA

We were always able to put the L3 agent under active/passive HA management. That doesn't solve the scaling issue for incoming and outgoing network traffic though, making the L3 agent a potential bottleneck.

Active/Active HA

neutron-scheduler

since Grizzly, we've had the neutron-scheduler service (formerly quantum-scheduler). This allows us to assign virtual routers to L3 agents in a round-robin fashion, such that we can distribute routers across multiple network nodes. However, this assignment is permanent and there is no automatic failure detection for L3 agents. In other words, if an L3 agent goes down, then the routers don't automatically get reassigned to a new one, and until an admin intervenes, the corresponding virtual networks have no outside connectivity, and no metadata proxy services.

neutron-ha-tool.py

(pre-Juno)

neutron-ha-tool.py can be integrated with a Pacemaker cluster to do automatic failover of routers when the L3 agent itself fails over. This is the approach integrated with SUSE Cloud 3.

Agent rescheduling

allow_automatic_l3agent_failover = True

this automatically reschedules routers associated with an L3 agent that is down. This is extremely slow, however, and likely to cause user-impacting network downtime.

HA virtual routers

(experimental)

HA virtual routers employ keepalived to maintain a VRRP gateway address inside a router namespace on two network nodes. Failover is extremely quick, meaning you do not lose a ping, but HA routers presently don't replicate connection state, so while the failover is quick enough to not drop a ping, existing TCP connections will die or need to be recreated. conntrackd is a logical addition.

Distributed virtual routers

(experimental)

there are several limitations to DVR and L3 HA in Juno. Most importantly, right now you can have a router that is either DVR or HA (with VRRP), but not both. So for any router, you can fix the SPOF or the bottleneck, but not both. DVR is also only supported with VxLAN.

Highly available

Nova guests

Still waiting...

Almost there!

Please

share, copy, adapt, remix!

https://github.com/fghaas/openstacksummit2014-paris-openstack-ha

http://www.hastexo.com/summitpromo