Parallel Computation with R



Parallel Computation with R

1 0


ParallelR

Presentation and Example Code for Parallel R

On Github elray1 / ParallelR

Parallel Computation with R

Isabelle Beaudry, Evan L. Ray

University of Massachusetts, Amherst

April 14, 2014

Outline

Some context: options when your code is slow A running example: Network Simulations Parallel computing with R: One computer, multiple cores
  • Overview
  • snowfall and ****
  • Random number generation
Parallel computing with R: The MGHPCC
  • Getting access
  • Logistics: connecting, transferring files, and submitting jobs
  • MPI

So your code runs slowly...

  • Step 0: Make sure you're getting the right answer.
    • “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” - Donald Knuth
    • Consider unit testing: See packages Runit and testthat
  • Step 1: Profile your code to see where it's slow
    • See ?Rprof and the package profr
  • Step 2: Consider using a different algorithm.
  • Step 3: Consider modifying your R code
    • Pre-allocate memory
    • Use built-in functions instead of loops
  • Step 4 (a): Consider using a faster language for the slow parts.
    • See package Rcpp
  • Step 4 (b): Consider parallelizing

Some General Resources

A running example: network simulations

  • Brief description of example here?
# allocate memory to store results
rds.sample1 <- array(NA, dim=c(n.net, n.nodes, 5))
est1 <- rep(NA, n.net)

for (j in 1:n.net){
  # simulate one network
  net <- create.nets(1)

  # simulate a respondent driven sample from the network
  rds.sample1[j, , ] <- rds.s(net)

  # do something
  rds.frame <- create.df(A, rds.sample1[j, , ], rds.sample1[j, , 2])

  # estimate something based on the respondent driven sample
  est1[j] <- RDS.II.estimates(rds.frame, outcome.variable="outcome")$estimate
}
We want to parallelize this for loop.

Requirements

  • There are a few things we need to do in the context of our example:

Many Packages

The foreach Package with doParallel

  • The foreach package provides the following general construction:
    foreach(i = 1:3) %dopar% {
      # do some stuff
    } 
  • We have to register a parallel backend with foreach.
There are many options: doParallel/parallel, doMPI/Rmpi, doMC/multicore, doSNOW/snow We will focus on doParallel/parallel:

Implementing our Example with doParallel

# the following also loads the foreach and parallel packages
library(doParallel)

# ...load other packages, define necessary functions, etc...

# create a "cluster" with 4 cores
cl <- makeCluster(4)

# set up RNG streams on the cluster nodes using L'Ecuyer-CMRG
set.seed(9523886)
clusterSetRNGStream(cl, iseed = c(runif(3, 0, 4294967086), runif(3, 0, 4294944442)))

# register the parallel backend with the foreach package.
registerDoParallel(cl)

# execute in parallel
est3 <- foreach(i = 1:n.net, .packages = c("statnet", "RDS"), .combine = cbind) %dopar% {
  net <- create.nets(1)
  rds.sample <- rds.s(net)
  rds.frame <- create.df(A, rds.sample, rds.sample[, 2])
  return(RDS.II.estimates(rds.frame, outcome.variable = "outcome")$estimate)
}

# stop the cluster
stopCluster(cl) 

Some General Resources

Some General Resources