Jason Bowman
All developers have been granted root access to the dev lanes. Developers should be able to resolve almost any issue on their own, and are encouraged to do so, but do not hesistate to seek support from ops if you are out of your depth or think you might make the situation worse. Worst case scenario we can rebuild the host, although we would like to avoid doing that.
At present the dev lanes are constrained for resources and we are at capacity. They are using as minimal a configuration as is possible. You will probably hit disk limits and performance may at times be less than ideal -- but they are in a functional state.
Chef is a set of rules that defines what an environment should look like. It deploys services and ensures that they are running, validating and preserving system state.
This is not always best for rapid iteration. You will want to disable chef from time to time so that it does not undo your changes.
# sudo chef-client
It's that easy.
# ln -sf /path/to/my/jar.jar /opt/releases/current/services/interview.jar
Every dev lane has a unique environment file and data bags.
data_bags / rlcorp-devX / lighthouse-assets.json [cms] lighthouse-services.json [cms] lighthouse-webapps.json [cms] properties.json [cms, web] # Do not copy from other environment web-assets.json [web] web-services.json [web] web-webapps.json [web] windows-passwords.json # Don't touch windows-release.json [iis] windows-webconfig.json [iis] # Do not copy other environment
rlcorpdevlinweb00X - wwwX.dev.rocketlawyer.com Redis RabbitMQ Tomcat - Frontend web services rlservices - interview-service, featureflipper, etc rlcorpdevlincms00X - cmsX.dev.rocketlawyer.com Tomcat - Backend web services, lighthouse rlservices - Standalone services # Ontology Service, integration, etc rlcrpdevmsrv00X - iis IIS - NET website Windows Components
rlcorpdevlincdh00X - [cdh] Elasticsearch Hadoop/Hbase Zookeeper rlcorpdevlinsql00X - sql MySQL # Shared single MSSQL instance for all dev lanes rlcrpdevmsql001 - mssql MSSQL # You shouldn't need to touch this host rlcorpdevlinadm001 [adm] Graphite
Build your feature branch locally and then, locally:
# scp /path/to/my-version.war user@rlcorpdevlinweb001:/tmp/my-version.war
Then on the dev lane:
# sudo rm -rf /opt/tomcat/current/webapps/my.war /opt/tomcat/current/webapps/my && mv /tmp/my-version.war /opt/tomcat/current/webapps/my.war
Tomcat will deploy it under the context of the war's filename. Optionally restart tomcat to kill orphaned threads of the undeployed context and free permgen.
# sudo su - rocketapp # cd /var/www/refresh_assets # git fetch && git branch features/myfeature # crontab -eAdd the following rule:
*/1 * * * * cd /var/www/refresh_assets/ && git pull
This will cause the host to poll for changes on github every minute.
# cd ~/OPS-Chef/data_bags/rl-test/ # cp web* lighthouse* windows-release.json ~/OPS-Chef-dev/data_bags/rlcorp-devX/ # cd ~/OPS-Chef-dev/data_bags/rlcorp-devX/ && git commit -am "Making devX match test" # git push
You can kick Jenkins for synchronization. It runs automatically every 5 minutes
Now run chef-client on your web and cms host.
# ssh -t user@rlcorpdevlinweb001 'sudo chef-client' # ssh -t user@rlcorpdevlincms001 'sudo chef-client'
Chef does a series of unit tests when it finishes running. Several of these tests are written with certain assumptions for production. It is likely that one or more unit tests may fail in a preproduction environment due to a restart chef triggered, or similar action. When in doubt run it again -- but everything is [probably] fine.
There is a property in your environment file that determines if chef runs automatically.
"chef" => { "auto_check_in" => false, "checkin_interval" => 300 }
You should attempt to diagnose the problem yourself and fix it if you can. Here are our most common problems.
Deploy a working artifact. Resync to the versions in test.
Check to see whats failing in probe. Login/pass: tom/cat Is tomcat repsponding? What deployed, what failed? View critical errors that have escaped error handling:
# tail -n1000 /opt/tomcat/current/logs/localhost.`date +%F`.logWhen in doubt, check catalina.out!
# tail -fn1000 /opt/tomcat/current/logs/catalina.outIIS... good luck.
Bounce tomcat.
If you're seeing sql errors, its likely a data sync issue. Allen wrote up a nice tutorial for doing your own syncs.
Log in to the database server for your dev lane. Run the load data script: # sudo /z/dbscripts/load_data.sh Select the source of where you want to pull the data from. [172.21.165.155] Choose which database farm you want to pull from. [rl-prod] Hit enter for where you want to download the files to. [default: /tmp/] admin is the correct user for the dev lanes admin123 is the password you will want to use localhost is correct so you can hit enter More than likely you will not want a backup created type n to skip Press enter, you want to preserve lighthouse accounts Press enter you do not what to preserver mysql accounts Press enter you want to cleanup working files Optional. if you want an email when it is done put in your email address, if not just hit enter. The script will now start downloading and importing all the databases. Let it go, this can take up to 15 minutes to complete. The script will nag you at the end to verify the procs are correct and to run ignite lighthouse to refresh the caches.
Your elastic search instance may not have all of the correct schemas loaded and data populated. We do not have a process for automating this yet. At present all dev lanes should have the correct schemas configured, but its likely there will be some drift.
# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-lv_root 12G 12G 0M 100% /The solution is to delete some log files. To check the file system size of a particular directory...
# du -sh /opt/tomcat/current/logs/ 539M /opt/tomcat/current/logs/ # yes | rm -rf /opt/tomcat/current/logs/*
/opt/tomcat/current/logs [web, cms] + bounce # service tomcat7 restart /var/log/rlservices/ [web, cms] + bounce # service rlservices restart /var/log/rabbitmq [web] + bounce # service rabbitmq-server restart /var/log/hbase [cdh] + bounce # for serv in hbase-master hbase-rest hbase-regionserver hbase-thrift; do service $serv restart; done
Dev lanes have JPDA enabled on port 8000 for debuggers and JMX for profilers.
To use VisualVM you should run jstatd on the host. To do this, you must create a security policy and keep a jstad process running.
Create a file named: jstatd.all.policy
grant codebase "file:/usr/java/latest/lib/tools.jar" { permission java.security.AllPermission; };
And then run:
# jstatd -J-Djava.security.policy=/home/user/jstatd.all.policy
You should now be able to connect VisualVM to the host.