OpenStack for Open edX:

Inside and Out

What is this

about?

You want a highly-available, scalable LMS deployment for, potentially, thousands of students. You developed course material on clusters and distributed computing in general, so you want each and every one of your students to have an equally distributed environment to play with, and of course, to break at will. As icing on the cake, you want to deploy on OpenStack. How do you build all this so that it doesn't cost a fortune for you to maintain a course run, and for students to join in? Here you'll see what we came up with to answer that question, using OpenStack and, of course, Open edX.

You should

know OpenStack

This talks assumes a little familiarity with OpenStack. We'll focus on OpenStack Heat and its YAML template language. If you don't know it, don't worry. OpenStack is extensively documented, and most of what will be covered here can be easily gleaned online.

You should

know Ansible

We also assume you know what we're talking about when we mention roles and playbooks. In particular, the roles and playbooks in edx/configuration. (In other words, you'll get the most out of this if you've deployed Open edX at least once.)

You should

know XBlocks

We'll talk about an XBlock, so a big part of what we'll cover assumes at least passing knowledge of what XBlocks are and, ideally, how they work under the hood.

Why?

Why did we chose the tools we did?

Why Open edX?

Openness and community
Technology
Extensibility
Coolness factor

We wanted to build our product around an open platform, so we shopped around for open source LMSs. Open source is not everything, however. Coming from a background of open source consulting, we know that having the code out there is not enough. For an open source project to be useful, it has to be alive. In other words, it requires leadership that embraces a community of both developers and users. Open edX certainly qualifies as a living project, but how hard is it to grok, as developers and sysadmins? Surprisingly, not very hard at all! The choice of Python for deployment (via Ansible YAML) and development (of the core LMS/CMS, at least, via Django) was a huge plus, in particular for OpenStack contributors such as ourselves. Even before initial investigation, we were pretty sure that no LMS would do what we wanted out of the box. (And that's not a bad thing, since we went in with the desire to contribute something new!) So we needed an LMS that was easily expandable. Another plus for Open edX! And last but not least... how awesome would it be to work with the latest, greatest, and coolest open source LMS out there?

Why OpenStack?

Openness and community
Technology
Extensibility
Coolness factor

Can you see the pattern, here? OpenStack is: - Also open, one of our requirements, and also very much alive, with a very vibrant community around of the biggest, most successful open source cloud technologies that currently exist - Also built on easy-to-read, easy-to-understand Python (and YAML, via Heat templates) - Also, and by definition, extensible and "automatable" from the bottom to the top (as any cloud platform should!) - Also the cool kid on the block, as far as cloud platforms go

Bingo!

It didn't take long for us, OpenStack consultants and trainers, to recognize that Open edX was the perfect complement to our consulting and training expertise. As you'll see, the union proved itself to be (well, almost) perfect for our needs, the proof of which is that we were able to come up with a product MVP in under six months.

Deploying

Open edX

OpenStack

Our first job was to find a way to deploy edX on OpenStack. First, on a single node, and then, on a highly-available, scalable manner to a cluster of VMs.

Single node?

Easy!

Port util/install/sandbox.sh to cloud-config
Use edx_sandbox.yml playbook as a template.

Installing Open edX on a single OpenStack VM was relatively easy. We whipped up a basic Heat template to use an Ubuntu 12.04 image, ported the pre-requisite installation steps from edx/configuration/util/install/sandbox.sh into a cloud-init script, and off we went.

Single node

VM Requirements?

8GB of RAM

The only ceiling we ran into, at first, was RAM usage. All Open edX's services on a single node require quite a bit of RAM: we found out 4GB is just not enough.

Multi-node?

Not so easy.

Desiging a scalable Open edX cluster without any previous experience wasn't as easy.

How

is Open edX

deployed

on edx.org?

Feanil Patel's talk

Our first idea was to try and find out how edX was deployed on edx.org. We were lucky, because not only had Feanil Patel given a talk on this very subject last year, it had been recorded and posted online.

This is how we designed our OpenStack cluster: - A small deploy node is instantiated, for the single purpose of orchestrating Ansible playbook runs onto all other nodes. edx/configuration will be checked out there, and pre-requisites installed. - Exactly three backend nodes are created. This is the minimal possible number of Galera nodes, and more than enough for Mongo replication. The mariadb and mongodb edx/configuration roles will be assigned to them. - An arbitrary number of app servers are launched, where a copy of everything from RabbitMQ, the CMS, and the LMS, to the forum and workers will run. - An OpenStack load balancer (as provided by LBaaS) directs requests from end users to the app servers. It is far from perfect, but we believe it is good enough for medium deployments.

Can we use the

AWS playbooks

from edx/configuration?

No.
(Sorry, removed in Cypress!)

While Feanil's talk was very enlightening, it didn't provide any technical details as to how exactly to write a playbook for a cluster. Also, we soon discovered that the sample AWS cloud formation templates were not a good source of information, as they were mostly unused and unmaintained. (So much so, they were removed entirely from edx/configuration for the Cypress release.)

How about

vagrant-cluster.yml?

YES!

vagrant-cluster.yml is a sample playbook included in edx/configuration for simulating a three-node cluster. It was very useful in elucidating how to use the different roles in a clustered manner, particularly as to what variables to set, and how.

What

variables?

Next, we needed to figure out how to use the various Ansible playbook variables to get the cluster working. It's a different set for the backend and app nodes.

Variables for the

backend nodes

MONGO_CLUSTERED: yes
ELASTICSEARCH_CLUSTERED: yes
MARIADB_CLUSTERED: yes

Variables for the

app servers

EDXAPP_MYSQL_HOST: "{{ groups['backend_servers'][0] }}"
XQUEUE_MYSQL_HOST: "{{ groups['backend_servers'][0] }}"
EDX_NOTES_API_MYSQL_HOST: "{{ groups['backend_servers'][0] }}"
EDXAPP_MONGO_HOSTS: "{{ groups['backend_servers'] }}"
FORUM_MONGO_HOSTS: "{{ groups['backend_servers'] }}"

These are just a few examples. There are more variables needed, and examples for each can be found in edx-configuration/playbooks/openstack/group_vars.

Writing the

Heat template

1 security group
1 private network with a router
2 cloud-configs (deploy and backend/app)
1 deploy node with a public IP
1 load balancer with a public IP
3 backend servers
X app servers in a pool
Parameters: Key, public network, VM sizes, and number of app servers
Outputs: Public IPs

After figuring out how to use the Ansible roles in edx/configuration, we forged ahead with writing the Heat template that would create the actual nodes in the cluster. What you see in the slide is the shopping list of so-called "resources" that the Heat template creates, as well as a list of parameters one can pass into the template at stack creation time. The idea is that this template can be used in any public or private OpenStack cloud that supports Heat and LBaaS, with no modifications!

The

Inventory

Generator

169.254.169.254/openstack/latest/meta_data.json

Problem: if the number of app servers is set at stack creation time, how do we get a list of their IP addresses, and how can we pass that to Ansible automatically, so that the proper roles can be deployed? This is what Ansible dynamic inventory generators are for. Luckily, it is possible to specify arbitrary metadata in the Heat template for a given OpenStack VM, in such a way that a simple HTTP request to a fixed URL, from that particular VM, will get you that data. It is also possible to insert the list of currently running app server IP addresses in there, so this gives us a way to automate playbook runs, even if the number of app servers is increased (or decreased).

Learning

OpenStack

Open edX

Alright, we have a highly-available, scalable Open edX cluster to play with. How about using it to teach OpenStack itself?

cluster

play with

Here's what we think is missing from the world of technology training: affordable self-paced courses that give the trainee an easy way to practice the theory of cluster deployment and maintenance.

$$$!

Part of the problem is that giving each and every trainee their own cluster, even if it's just composed of VMs, would be very expensive.

Heat

to the rescue!

A Heat stack for each student...
... suspended when not in use.

The solution we came up with is to fire a Heat stack for every student, but then suspend it if it's not in use. When the trainee comes back, of course, the stack is resumed automatically.

Enter

XBlocks!

https://github.com/hastexo/hastexo-xblock

As noted before, one of the reasons we chose Open edX was because it was extensible. And XBlocks were the extension API that seemed most suited to implement our solution: it offered enough flexibility to implement the needed features, and also an easy way to keep them customizable at the hands of course authors. In other words, with an XBlock we can let the course author define the Heat stack for a particular run, then fire it up for every trainee as needed.

OpenStack auth

<vertical url_name="lab_introduction">
  <hastexo
    url_name="lab_introduction"
    stack_template_path="hot_lab.yaml"
    stack_user_name="training"
    os_auth_url="https://ops.elastx.net:5000/v2.0"
    os_tenant_name="example.com"
    os_username="demo@example.com"
    os_password="foobarfoobarfoofoo" />
</vertical>

This is how a course author invokes the hastexo XBlock, using OLX. Here you can see the standard OpenStack authentication variables, and also the asset file name of the Heat template that should be uploaded to the data store. The user name that will be used to connect to the stack via SSH should also be defined here, and match what the Heat template creates.

Heat template outputs

outputs:
  public_ip:
    description: Floating IP address of deploy in public network
    value: { get_attr: [ deploy_floating_ip, floating_ip_address ] }
  private_key:
    description: Training private key
    value: { get_attr: [ training_key, private_key ] }

It's just

Python, right?

Accesses the data store directly
Uses the LMS's workers

We wanted an easy way for course authors to upload their Heat template, and at least for now, that means uploading it to the course's data store, from which the XBlock will fetch it directly. Firing up a Heat stack takes a while, so we needed a way to do this assynchronously... and the LMS's workers were right there! So instead of installing a whole other task queue, we went with Celery. though it requires adding the hastexo-xblock to the LMS's installed apps.

Connecting the

browser

to the

lab environment

... with GateOne!

The last piece of the puzzle was finding a way to connect the course content, as displayed on a student's browser, to the lab environment that would be created just for them. The solution we found was GateOne, a open source, Python and Javascript terminal emulator and SSH client. When run on the same app server as the LMS hosting the XBlock, it allowed us to create an SSH connection securely and automatically to the student's cluster.

Bells

and

Whistles

A role to deploy the XBlock
A role to deploy courses from git

In developing all this, we also came up with a couple of helper roles. One to deploy our XBlock properly, including dependencies, adding it to the LMS's installed apps, and configuring the nginx reverse proxy. The other, to deploy courses from git.

Open edX

+ OpenStack

= Awesome!

Improvements

Nothing's perfect, and this is how we plan to improve in the near future.

... for the Open edX cluster

Clustered RabbitMQ
Separate worker (and other) nodes
Load balancer for Galera
Auto-scaling

... for the hastexo XBlock

Singleton definition in OLX
Configurable automatic grader
Don't depend (so much) on Django features

Use the

Source

We have followed the precedent set by most other currently available XBlocks and released ours under the AGPL. The OpenStack deployment bits are of course under the AGPL as well, as is the rest of the edx-configuration repo.

http://hastexo.github.io/openedx2015

https://github.com/hastexo/openedx2015

This presentation, like most hastexo slide decks, is under the CC-BY-SA 3.0 license. So please feel free to peruse these slides as a basis for your own presentations, as you see fit.

https://www.hastexo.com/openedx

This is our Open edX landing page, from which you can continue to - resources related to our OpenStack community involvement, - news releases, - a 3-minute video explaining why we got into this (essentially, what I said at the top of the talk in 3 minutes, with pretty graphics and audio narration).

OpenStack for Open edX: Inside and Out adolfo.brandes@hastexo.com @arbrandes | @hastexo

openedx2015

arbrandes

openedx2015

2 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

openedx2015

OpenStack for Open edX:

Inside and Out

about?

know OpenStack

know Ansible

know XBlocks

Why?

Why Open edX?

Why OpenStack?

Bingo!

Open edX

OpenStack

Easy!

VM Requirements?

Not so easy.

How

deployed

AWS playbooks

vagrant-cluster.yml?

variables?

backend nodes

app servers

Heat template

Inventory

OpenStack

Open edX

cluster

play with

$$$!

Heat

XBlocks!

OpenStack auth

Heat template outputs

Python, right?

browser

lab environment

Bells

Whistles

Open edX

+ OpenStack

= Awesome!

Improvements

... for the Open edX cluster

... for the hastexo XBlock

Source

2 0