Parallel Computation with R
Isabelle Beaudry,
Evan L. Ray
University of Massachusetts, Amherst
April 14, 2014
Outline
Some context: options when your code is slow
A running example: Network Simulations
Parallel computing with R: One computer, multiple cores
- Overview
- snowfall and ****
- Random number generation
Parallel computing with R: The MGHPCC
- Getting access
- Logistics: connecting, transferring files, and submitting jobs
- MPI
So your code runs slowly...
- Step 0: Make sure you're getting the right answer.
- “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” - Donald Knuth
- Consider unit testing: See packages Runit and testthat
- Step 1: Profile your code to see where it's slow
- See ?Rprof and the package profr
- Step 2: Consider using a different algorithm.
- Step 3: Consider modifying your R code
- Pre-allocate memory
- Use built-in functions instead of loops
- Step 4 (a): Consider using a faster language for the slow parts.
- Step 4 (b): Consider parallelizing
Some General Resources
- Advanced R development, by Hadley Wickham
- 2008 UseR presentation by Dirk Eddelbuettel:
A running example: network simulations
- Brief description of example here?
# allocate memory to store results
rds.sample1 <- array(NA, dim=c(n.net, n.nodes, 5))
est1 <- rep(NA, n.net)
for (j in 1:n.net){
# simulate one network
net <- create.nets(1)
# simulate a respondent driven sample from the network
rds.sample1[j, , ] <- rds.s(net)
# do something
rds.frame <- create.df(A, rds.sample1[j, , ], rds.sample1[j, , 2])
# estimate something based on the respondent driven sample
est1[j] <- RDS.II.estimates(rds.frame, outcome.variable="outcome")$estimate
}
We want to parallelize this for loop.
Requirements
-
There are a few things we need to do in the context of our example:
-
-
-
Many Packages
- Advanced R development, by Hadley Wickham
- 2008 UseR presentation by Dirk Eddelbuettel:
The foreach Package with doParallel
There are many options: doParallel/parallel, doMPI/Rmpi, doMC/multicore, doSNOW/snow
We will focus on doParallel/parallel:
Implementing our Example with doParallel
# the following also loads the foreach and parallel packages
library(doParallel)
# ...load other packages, define necessary functions, etc...
# create a "cluster" with 4 cores
cl <- makeCluster(4)
# set up RNG streams on the cluster nodes using L'Ecuyer-CMRG
set.seed(9523886)
clusterSetRNGStream(cl, iseed = c(runif(3, 0, 4294967086), runif(3, 0, 4294944442)))
# register the parallel backend with the foreach package.
registerDoParallel(cl)
# execute in parallel
est3 <- foreach(i = 1:n.net, .packages = c("statnet", "RDS"), .combine = cbind) %dopar% {
net <- create.nets(1)
rds.sample <- rds.s(net)
rds.frame <- create.df(A, rds.sample, rds.sample[, 2])
return(RDS.II.estimates(rds.frame, outcome.variable = "outcome")$estimate)
}
# stop the cluster
stopCluster(cl)
Some General Resources
- Advanced R development, by Hadley Wickham
- 2008 UseR presentation by Dirk Eddelbuettel:
Some General Resources
- Advanced R development, by Hadley Wickham
- 2008 UseR presentation by Dirk Eddelbuettel: