Your Dev Lane and You

Instructions for Proper Care and Feeding

Jason Bowman

Access priveleges

All developers have been granted root access to the dev lanes. Developers should be able to resolve almost any issue on their own, and are encouraged to do so, but do not hesistate to seek support from ops if you are out of your depth or think you might make the situation worse. Worst case scenario we can rebuild the host, although we would like to avoid doing that.

Resources

At present the dev lanes are constrained for resources and we are at capacity. They are using as minimal a configuration as is possible. You will probably hit disk limits and performance may at times be less than ideal -- but they are in a functional state.

What does chef do?

Chef is a set of rules that defines what an environment should look like. It deploys services and ensures that they are running, validating and preserving system state.

This is not always best for rapid iteration. You will want to disable chef from time to time so that it does not undo your changes.

Manually running chef

# sudo chef-client

It's that easy.

Directory structure

/var/www/refresh_assets - repo: RL-US-Assets maps to /refresh_assets
/var/www/html - repo: RL-RefreshAssets maps to /
/opt/tomcat/current/webapps/ - deployed webapps, chef copies them from /opt/releases
/opt/releases/current/services - standalone jars, contains a subdir with a symlink -- init script runs the symlink

# ln -sf /path/to/my/jar.jar /opt/releases/current/services/interview.jar

Environments and Databags

Every dev lane has a unique environment file and data bags.

Environment: We use environments to describe databases, sever locations, jvm options, and global properties.
Databags: We use databags to describe assets, artifacts, and properties.

Databags

data_bags / rlcorp-devX / 
   lighthouse-assets.json [cms]
   lighthouse-services.json [cms]
   lighthouse-webapps.json [cms]
   properties.json [cms, web] # Do not copy from other environment
   web-assets.json [web]
   web-services.json [web]
   web-webapps.json [web]
   windows-passwords.json # Don't touch
   windows-release.json [iis]
   windows-webconfig.json [iis] # Do not copy other environment

Topology

rlcorpdevlinweb00X - wwwX.dev.rocketlawyer.com
   Redis
   RabbitMQ
   Tomcat - Frontend web services
   rlservices - interview-service, featureflipper, etc
        
rlcorpdevlincms00X - cmsX.dev.rocketlawyer.com
   Tomcat - Backend web services, lighthouse
   rlservices - Standalone services # Ontology Service, integration, etc

rlcrpdevmsrv00X - iis
   IIS - NET website
   Windows Components

Topology contd.

rlcorpdevlincdh00X - [cdh]
   Elasticsearch
   Hadoop/Hbase
   Zookeeper

rlcorpdevlinsql00X - sql
   MySQL

# Shared single MSSQL instance for all dev lanes
rlcrpdevmsql001 - mssql
   MSSQL

# You shouldn't need to touch this host 
rlcorpdevlinadm001 [adm]
   Graphite

How should I develop then?

Backend

Build your feature branch locally and then, locally:

# scp /path/to/my-version.war user@rlcorpdevlinweb001:/tmp/my-version.war

Then on the dev lane:

# sudo rm -rf /opt/tomcat/current/webapps/my.war /opt/tomcat/current/webapps/my && mv /tmp/my-version.war /opt/tomcat/current/webapps/my.war

Tomcat will deploy it under the context of the war's filename. Optionally restart tomcat to kill orphaned threads of the undeployed context and free permgen.

Frontend

Auto pulling of static assets without chef:

# sudo su - rocketapp
# cd /var/www/refresh_assets
# git fetch && git branch features/myfeature
# crontab -e

Add the following rule:

*/1 * * * * cd /var/www/refresh_assets/ && git pull

This will cause the host to poll for changes on github every minute.

Lighthouse

/opt/resources/cms-data.properties - Configure branch
/opt/resources/cms-git/repository - Managed by lighthouse
Git reload in lighthouse will load new properties

Restoring a dev lane to a particular state

Locally, make sure your local repos are up to date.

# cd ~/OPS-Chef/data_bags/rl-test/ 
# cp web* lighthouse* windows-release.json ~/OPS-Chef-dev/data_bags/rlcorp-devX/
# cd ~/OPS-Chef-dev/data_bags/rlcorp-devX/ && git commit -am "Making devX match test"
# git push

You can kick Jenkins for synchronization. It runs automatically every 5 minutes

Now run chef-client on your web and cms host.

# ssh -t user@rlcorpdevlinweb001 'sudo chef-client'
# ssh -t user@rlcorpdevlincms001 'sudo chef-client'

I ran chef and I got an error!

Chef does a series of unit tests when it finishes running. Several of these tests are written with certain assumptions for production. It is likely that one or more unit tests may fail in a preproduction environment due to a restart chef triggered, or similar action. When in doubt run it again -- but everything is [probably] fine.

Enabling chef

There is a property in your environment file that determines if chef runs automatically.

"chef" => {
   "auto_check_in" => false,
   "checkin_interval" => 300
}

If true, installs a cron job in /etc/cron.d/chef-client when chef-client is run. You must run this everywhere.
If false, removes the cron job on the next run. You can remove the cron job your self to disable it on just a single node.

My dev lane is broken in some way. What should I do?

You should attempt to diagnose the problem yourself and fix it if you can. Here are our most common problems.

An artifact won't deploy

Deploy a working artifact. Resync to the versions in test.

Debugging Tomcat/500's'

Check to see whats failing in probe. Login/pass: tom/cat Is tomcat repsponding? What deployed, what failed? View critical errors that have escaped error handling:

# tail -n1000 /opt/tomcat/current/logs/localhost.`date +%F`.log

When in doubt, check catalina.out!

# tail -fn1000 /opt/tomcat/current/logs/catalina.out

IIS... good luck.

Out of permgen

Bounce tomcat.

RabbitMQ queues not being consumed

Check the rabbit console - [guest/guest] Check message queues, are they backing up? Somethings broken and it can help you find out what.

Data sync issues

If you're seeing sql errors, its likely a data sync issue. Allen wrote up a nice tutorial for doing your own syncs.

Log in to the database server for your dev lane.
Run the load data script: # sudo /z/dbscripts/load_data.sh
Select the source of where you want to pull the data from. [172.21.165.155]
Choose which database farm you want to pull from. [rl-prod]
Hit enter for where you want to download the files to. [default: /tmp/]
admin is the correct user for the dev lanes
admin123 is the password you will want to use
localhost is correct so you can hit enter
More than likely you will not want a backup created type n to skip
Press enter, you want to preserve lighthouse accounts
Press enter you do not what to preserver mysql accounts
Press enter you want to cleanup working files
Optional. if you want an email when it is done put in your email address, if not just hit enter.
The script will now start downloading and importing all the databases. Let it go, this can take up to 15 minutes to complete.
The script will nag you at the end to verify the procs are correct and to run ignite lighthouse to refresh the caches.

Search is broken

Your elastic search instance may not have all of the correct schemas loaded and data populated. We do not have a process for automating this yet. At present all dev lanes should have the correct schemas configured, but its likely there will be some drift.

Out of disk

How to check disk usage

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                               12G   12G  0M  100% /

The solution is to delete some log files. To check the file system size of a particular directory...

# du -sh /opt/tomcat/current/logs/
539M	/opt/tomcat/current/logs/
# yes | rm -rf /opt/tomcat/current/logs/*

The most common culprits are...

/opt/tomcat/current/logs [web, cms] + bounce
        # service tomcat7 restart
/var/log/rlservices/ [web, cms] + bounce
        # service rlservices restart
/var/log/rabbitmq [web] + bounce
        # service rabbitmq-server restart
/var/log/hbase [cdh] + bounce
        # for serv in hbase-master hbase-rest hbase-regionserver hbase-thrift; do service $serv restart; done

Profiling and debugging

Dev lanes have JPDA enabled on port 8000 for debuggers and JMX for profilers.

To use VisualVM you should run jstatd on the host. To do this, you must create a security policy and keep a jstad process running.

Create a file named: jstatd.all.policy

grant codebase "file:/usr/java/latest/lib/tools.jar" {
	permission java.security.AllPermission;
};

And then run:

# jstatd -J-Djava.security.policy=/home/user/jstatd.all.policy

You should now be able to connect VisualVM to the host.

Your Dev Lane and You – Environments and Databags – Topology

sini

Your Dev Lane and You – Environments and Databags – Topology

0 1

pres