Background
- EuPathDB hosts websites accessible over the public internet.
- Hosted on physical servers in data centers in Athens, Philadelphia.
- Operating system, Apache webserver, Tomcat application server, Oracle databases
We host public websites on server hardware in datacenters in Athens, Philadelphia.
Websites are backed by two Oracle databases, also hosted in our datacenter.
We can also package a website and databases on to a virtual machine that includes
everything- We can also bundle everything into a single, standalone virtual machine
Why VMs?
- Carry to conferences, workshops where internet connectivity is poor
- Archive previous releases
- Host at institutions where internet connectivity is poor.
SIG Goals
- Give overview of the labor costs of generating VMs
- Discuss how to reduce those costs
- Address concerns: security, user support, legal
Frequently Asked Questions
- Why does it take a week to make a VM?
- Can it be done in less time?
- Why isn't the process automated?
- This VM is broken. Why didn't you test it better?
Leveraged Technologies
( 0-5 maturity)[+ core infra, - saVM specific]
- KVM + libvirt (3.5)[+]
- dedicated laptop and custom UI (3)[-]
- VMWare (3.5)[+]
- Puppet (3.5)[+]
- self-configuring Apache hosts (5)[+]
- Tomcat Instance Manager (5)[+]
- GUS build (4)[+]
- rebuilder (5)[+]
- configula - automated WDK configuration (4)[+]
- /dashboard (5)[+]
- VMs are mostly an extension of core infrastructure - just another server
- have to take care of core infra before tackling VMs
- most of the technology is in place for pipelining VM generationVM Contruction Overview
Create VM template
- bootstrap with Kickstart
- Puppet deployment of core software, configuration
- manual QA, tweaks
VM Contruction Overview
New VM from Template
- update OS patches
- update from latest Puppet manifests
- configure network
- resize disks
VM Contruction Overview
Clone Databases
- export production database to a file
- on VM: NFS mount the production export directory
- create empty database
- import the export file
- repeat for apicomm
VM Contruction Overview
Install Website
- checkout source code
- build
- configure
VM Contruction Overview
Install webService files
- 400 GB
- exclude BAM files
- 15 GB
VM Contruction Overview
Run tuningManager
- additional software install
- additional configuration
- additional complexity
- additional point of failure
- additional time
QA
- review kvm original
- download vmware to laptop
- review again
Pitfalls of manual assembly
- labor intensive
- disruptive to other project tasks
- long lead time required
- 6 or more months between assemblies
Future
- full pipeline build
- Jenkins integration builds
- migrate from VMWare to VirtualBox
- snapshot all sites every release?
Distribution Issues
(Group Discussion)
Issues:
Remote Support
How do we provide it?
- too complex for novices
- remote access, past firewall/NAT
- end-user changes complicate support
- OS security updates
Issues:
Licensed Software
- Oracle
- Java Development Kit
- WU-BLAST (deprecated)
- signalp
- tmhmm
- others???
- Redistribution of Runtime Environment is OK. Redistribution of JDK has some restrictions.
- need JDK to compile WDK in situ
- wublast now replaced with Public Domain NCBI blast
- signalp, tmhmm currently not included on VMs, but need to be aware that it does not creep inIssues:
Sensitive Data
- Oracle user accounts include passwords
- apicomm has user search history, contact info and passwords
- leakage of sensitive system details
- deployed by Puppet, user shell history after setup
how to clone only data:
need careful refactoring of Puppet manifests
need dedicated user account for build on removable diskNeeded Infrastructure Improvements
(Group Discussion)
Needed Infrastructure Improvements:
API for schema requirements
What are the minimum schema/tables required?
- avoid private user schemas
- keep up with application changes through releases
- e.g. GUS core.info
- core.info does not include tuning tables
- apicomm has no equivalent
Needed Infrastructure Improvements:
Remove user data from apicomm
- Make empty apicomm on the VM?
- loss of example strategies
- loss of community uploaded files
- are apicomm schema installation scripts up to date?
- Clone subset of production apicomm, excluding community profile data?
Needed Infrastructure Improvements:
Lock tuning tables
Live import fails when tuningManger is running
- block tM when exporting
- block export when tuning
Needed Infrastructure Improvements:
Automated QA
Conclusions, Action Items
- VMs will use empty apicomm instance, no importing data from production
- (12/18/14 update) VMs will use a mostly empty apicomm instance, only user comments will be imported
- VMs will not have pre-loaded example strategies
- VMs will not have pre-registered users
- infrastructure team will provide schema installation script
- All schema names needed to support the website will be registered in core.databaseinfo so the databases can be programmatically cloned and staff schema, workflow schema can be excluded.
- e.g. tuningManager needs to be patched to do this
- infrastructure will not support runtime flags to aid preventing conflicts between tuningManager and export scripts.
- blocking tuningManager will have to be done in Jenkins.
Feedback from David R.
22 December 2014 phone conversation.
- retaining Community strategies is desired if can be done low cost.
- ability to transfer strategies from VM to production