hpc-users-group4



hpc-users-group4

0 0


hpc-users-group4

Presentation to HPC Users Group at ILRI, Nairobi.

On Github alanorth / hpc-users-group4

HPC Users Group

Creating a community around research computing at ILRI

@mralanorth

December 4, 2013

HUG Meetup #4

Infrastructure hierarchy

HPC

Login node

"Master" of the slaves

Don't run things here :)

Taurus

128 GB of RAM

64 CPUs ("cores")

Great for batch and interactive jobs

Mammoth

384 GB of RAM

16 CPUs ("cores")

Great for high-memory jobs

What's new

New software available

  • EMBOSS 6.6.0
  • BEAST 1.7.5
  • HTSeq
  • QDD
  • seq_crumbs
  • BioPython for Python 2.7.5
  • Abacas
  • Whole-genome shotgun database (NCBI)

New hardware

10 Gigabit Ethernet

It's boring... I know.

10 Gigabit Ethernet

To infinity... and beyond!

It's faster

(in a nutshell)

Roughly ten times faster

Bandwidth graph during blastn against WGS (peak throughput 720MB/sec)

Why is my program so slow?!

Two main reasons!

A program can be CPU bound or I/O bound...

CPU-bound programs

A program is said to be CPU bound when the time to complete a task is determined principally by the speed of the central processing unit (CPU).

Programs which are generally CPU bound

R, structure, ???

Things you can do to speed up a CPU-bound program

  • Get a faster CPU (ie, 2.4GHz -> 2.8GHz)
  • Spread the problem over multiple CPUs ("multi-threaded" code)

Multi-threaded code

Many programs allow you to specify cores, threads, or CPUs. For example:

BLAST:

blastn -num_threads 4

Trinity:

Trinity.pl [..] --output blah.out --CPU 4 --inchworm_cpu 4 --bflyCPU 4

Make sure to match your program's thread/core/cpu parameter with the SLURM job parameters!

I/O-bound programs

A program is said to be I/O bound when the time to complete a task is determined principally by the speed of the input/output subsystem (ie the disk or the network).

Programs which are generally I/O bound

BLAST, Trinity, Bowtie, ???

Things you can do to speed up an I/O-bound program

  • Get faster disks
  • Get a faster network (yay!)
  • Compress your input/output

Mitigating slow performance due to I/O

Pro-tip: don't write to the network!

Using local storage in batch jobs

taurus and mammoth have a local "scratch" space we can use:

/var/scratch

Scratch space is a local, temporary place to put your data where you don't incur penalties for going over the network

Example batch script

#!/bin/env bash
#SBATCH -p batch
#SBATCH -n 4
#SBATCH -J blastn

# load the blast module
module load blast/2.2.28+

WORKDIR=/var/scratch/$SLURM_JOBID
mkdir -p $WORKDIR

echo "Using $WORKDIR on $SLURMD_NODENAME"
echo

# run the blast with 4 CPU threads (cores)
blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out $WORKDIR/output

Creates a unique temp directory for you during runtime, ie:

/var/scratch/10287

Make sure to clean up when you're done!

SLURM stats

Powered by UBMoD

Job numbers

13,646 jobs since April!

530 jobs in the last 30 days :(

39 jobs in the last week :( :(

Are we doing science? :(

Getting Help

http://hpc.ilri.cgiar.org

HPC Users Group Creating a community around research computing at ILRI @mralanorth December 4, 2013 HUG Meetup #4