Continuing my effort to help you get the most out of your CPUs, I figured we could look into using some multiprocessing functionality available for your R scripts. While there are a few different options for running multi-core treatments on your data, we’ll focus on something really simple to put in place.

A while back, I was putting together a script to run a large series of logistic regressions (using the glm package) in an attempt to model some data. This was fairly time consuming since a great number of these regressions had to be computed (and optimized !). Ultimately though, all of these calculation runs were independent from one another.. So obviously I decided to look for a way to parallelize the execution.

The solution I found (which fit my code structure) was through the simple replacement of the lapply function I was using by the mclapply implementation from the parallel package (which has been part of the R distribution since version 2.14). So a simple function call replacement cut my calculation time by 4 ! (roughly.. :)).

And when I say the replacement was simple, here’s what I mean:
Original piece of code:

...

gene_scores <- data.frame()
gene_scores <- do.call('rbind', lapply(genes, function(x, data, formula) {
yvar <- all.vars(formula)[1]
work <- data[,c(yvar, x)]
model <- glm (formula, data=work, family=binomial)
s <- summary(model)
crossval <- CV_JPL(model, print.details=FALSE)
return(data.frame(gene=x, deviance=s$deviance, acc.cv=crossval$acc.cv,
acc.internal=crossval$acc.internal)) }, data=training, formula=formula)) ...  Multicore version: library(parallel) # Need to load the library ! ... gene_scores <- data.frame() gene_scores <- do.call('rbind', mclapply(genes, function(x, data, formula) { yvar <- all.vars(formula)[1] work <- data[,c(yvar, x)] model <- glm (formula, data=work, family=binomial) s <- summary(model) crossval <- CV_JPL(model, print.details=FALSE) return(data.frame(gene=x, deviance=s$deviance,
acc.cv=crossval$acc.cv, acc.internal=crossval$acc.internal))
}, data=training, formula=formula, mc.cores=4))

...


I put the code changes in bold so you can spot them easily.
They amount to:
3- specify the number of cores to use with mc.cores=X parameter