On Github fghaas / openstackisraeldec2013
Hello everyone.
I would like to start with a few personal words of my own, and I thank you in advance for tolerating my accent.
This is my second OpenStack Israel. And the first one in May had a profound impact on my professional and personal life.
Over the last few months, I have had the pleasure of interacting with enormously intelligent, professional, and unreservedly honest people from this community.
And up to this point, throughout my travels, I have never felt so immediately welcome in any other country I have visited.
So for all of that, thank you very much.
A fresh perspective on globally distributed OpenStack
So what is the topic of today's talk? We'll be talking about...
... geographically distributed OpenStack clouds. So what is that? What makes our cloud "geographically distributed"? Is it a specific distance between sites (in kilometers)? Or is it a specific mode of organizing our cloud? Or a specific goal we want to achieve with it?
An OpenStack cloud (or multiple clouds) spanning multiple geographical sites with limited bandwith and significant latency between them.
You should understand significant in the sense as it is used in statistics, that is, non-negligeable or simply, "making a difference".
In general, we're talking several orders of magnitude greater than a LAN. LAN Ethernet latency is typically about 100 microseconds, between sites we usually get on the order of 1-10 milliseconds (that is, 10 times to 100 times greater than in a LAN).
Being able to recover services in a backup site if the primary site is disabled or unavailable.
This is the classic no-brainer motivation for geographic distribution. What if an A380 slams into my DC, or a backhoe cuts the fiber to my building?
Being able to host content or services in geographical or legal proximity to your users.
You may want to host your services in the general geographical area as your users. This may be for performance reasons, but jurisdiction considerations also apply: you might want to offer compliance with a specific set of laws when your users demand that compliance. Think users wanting EU data protection laws or not wanting to be spied on by some — cough — three-letter government agencies associated with three-letter countries.
Being able to geographically position your services based on time of day or other time-based parameters.
Your services may be latency-critical, and may experience peaks based on time of day (or year). It may be cost effective to host your services in such a way that they are always closest to the majority of your users.The ability for any user to use their authentication credentials, unmodified, across the entire cloud.
The ability to keep persistent user data available to services, no matter where they run.
The ability to provide a unified logical view of the network connections between sites, which is decoupled from a heterogeneous physical network.
The ability to automate, deploy and orchestrate services across the cloud.
The ability to manage, observe and modify cloud workloads from a single UI.
The ability to interact programmatically with the entire cloud in a streamlined fashion.
All services are shared, compute hosts are segregated
Not suitable for geographic distribution. Does not take network failures in stride, really expects the network to not fail. So: unless you're a telco and you own your fiber, this approach isn't for you.All services are shared, compute hosts are grouped
Similar approach to availability zones, but usually not used for availability purposes, rather to place guests on "trusted" hardware. Same limitations as the AZ approach.Nova is segregated by sites, everything else is shared
On the Nova side, this approach is superior for distributed computing compared to the others. But it doesn't do anything good for the distribution of other services, notably Glance and Cinder.Completely separate OpenStack installations unified by a single Keystone
This is really the only option we have for truly distributed clouds. Which introduces us to a different problem altogether.It all boils down to
In OpenStack, most services split data and metadata.
Metadata usually goes into a relational database, whereas data goes into a service-specific store.
That means we must separately replicate data and metadata, while keeping them consistent.
Unfortunately, that's
Benjamin Erb, Concurrent Programming for Scalable Web Architectures, CC-BY-SA
Our only reliable multi-site, multi-master metadata store, MySQL+Galera, is a CP system.
Several service-specific stores are also CP.
Example: Cinder with Ceph or GlusterFS
We would have to synchronize their replication stream with the MySQL database's.
Other stores are AP (eventually consistent)
Example: Swift, and any service using it
It is fundamentally impossible to synchronize these data stores with metadata in a CP store, across multiple sites.
Luckily, are two
does not separate data from metadata
Everything lives in the database
can run in standalone mode
managing multiple clouds.
It's important to understand that if we want to use Galera replication for Keystone, but no replication for any other database content, then we need separate MySQL installations for the Keystone database and the other services' databases.
MySQL/Galera does not support selective (per-database) replication the way that legacy MySQL replication does.
But can we
anything
Well,
glance-replicator has existed since Folsom. It atomically replicates both image data and metadata to a remote Glance.
The problem with it is that it's on demand, not streaming, and it's one-way, so multi-master replication isn't part of its design. It does replicate image UUIDs, though, so it should catch duplicates and so circular replication should work.
Its shortcomings aside, glance-replicator sure looks promising for the distributed cloud use case. Wouldn't it be great if we had something like it for Cinder, as well?
Wait, what?
Like Volume Mirroring?
Volume mirroring, proposed by Avishay Traeger from IBM in Haifa and discussed in the Icehouse Summit, looks like it solves this problem for us.
But unfortunately...
Volume mirroring, as proposed for Icehouse, will deal with replication within a single Cinder instance only, not with replication between Cinders.
Nonetheless, this may be an important stepping stone toward true distributed cloud volume replication.
We don't have a replication story for Neutron yet. The only thing you can do there is use your own external routing and/or NAT, presumably boiling down to BGP at the edge network.
No cross-cloud migration story there yet, either. But then, you should just orchestrate at the Heat level, really.
Liked this talk?