When I started using R, about ten years ago, the community was much smaller. No R-bloggers to get inspired or ggplot2 to make nice graphs. It was the beginning of an other implementation of R (other than CRAN’s) known as Revolution R from Revolution Analytics. Their R targeted enterprise and was designed to be faster and more scalable. They also offer an open source version of their product called RRO.

In April 2015, the company was acquired by Microsoft! May be this means, that at some point, everybody will be able to do R in Excel (instead of using Visual Basics)… We’ll see!

Meanwhile, I decided to try RRO, now MRO (Microsoft R Open) and see if I could get higher performance on the day-to-day analyses I do. A benchmark is available (results and code) on their website to show exactly what you can gain from their multithreaded implementation. I decided to test it anyway even if those operations were not the ones I used most frequently.

From Revolution Analytic website

From Revolution Analytic website

So I set up my own little benchmark. I did so using an R Markdown file to have the results nicely put in an HTML document (other possible outputs are pdf and Word formats). I ran my benchmark using MRO 3.2.3 and CRAN’s R 3.2.2. I ran it on my desktop computer with 4 cores. The html outputs can be found here.

Remember the post I did about python and how to know which piece of code is the fastest? We can do the same in R!
There are two simple functions to measure time in R : proc.time() and system.time(). You can use them like this :

 
m <- 10000	 	 
n <- 2000	 	 
A <- matrix (runif (m*n),m,n)	 	 
system.time (P <- prcomp(A)) 	 	 
## or 	 	 
ptm <- proc.time()	 	 
P <- prcomp(A)	 	 
proc.time() - ptm	 	 
# user system elapsed 	 	 
# 122.594 1.122 123.583 	 	 
#The ‘user time’ is the CPU time charged for the execution of user instructions of the calling process. 	 	 
#The ‘system time’ is the CPU time charged for execution by the system on behalf of the calling process.

However, sometimes, you would like to get a more detailed look at the steps to see which one takes the most time. That is where the package timeit comes in. The usage is similar to system.time but timeit is, what we call, a profiler so the output is different :

 
t0 <- timeit (P <- prcomp(A), replications=1, times=1)	 	
 
#                  self.time     self.pct total.time    total.pct mem.total replications iteration
# "La.svd"             98.34 78.994296731      98.65 28.320032152     488.4            1         1
# "%*%"                25.27 20.298819182      25.27  7.254406614     152.6            1         1
# "matrix"              0.23  0.184753795       0.23  0.066027444     183.1            1         1
# "array"               0.16  0.128524379       0.16  0.045932135     152.6            1         1
# "aperm.default"       0.14  0.112458832       0.14  0.040190618     152.6            1         1
# "is.finite"           0.11  0.088360511       0.11  0.031578343     152.6            1         1
# "colMeans"            0.08  0.064262190       0.08  0.022966068       0.0            1         1
# "svd"                 0.05  0.040163869      98.81 28.365964288     671.5            1         1
# "any"                 0.04  0.032131095       0.04  0.011483034       0.0            1         1
# "t.default"           0.03  0.024098321       0.03  0.008612275      30.5            1         1
# "sweep"               0.02  0.016065547       0.32  0.091864271     305.2            1         1
# "prcomp.default"      0.01  0.008032774     124.49 35.738071999    1159.9            1         1
# "as.matrix"           0.01  0.008032774       0.01  0.002870758       0.0            1         1


The first two columns are the ones we want to look at, self.time and self.pct. They are respectively the time in seconds and the percentage of total time spent executing the code of the stated function.

In my output, you can see all the steps that are done in order to do a principal component analysis and the time and memory used by each step. For quick operations, you can set a number of replications and iterations in order to get a reasonable idea. In the output above, you can see that prcomp uses the any and as.matrix functions and that they are quite fast while %*% and La.svd are the most costly steps. To get the total amount of time, we can sum the self.time column.

 
sum(t0$self.time)
#123.51

Here are some the results for the comparison of R and MRO. Taking the example provided on Revolution website, MRO is clearly the fastest at doing PCA (27.8 vs 129.3 seconds) on a 2000 x 10,000 matrix but it's less clear when I use my own example of 66 x 137,032 (2.02 vs 2.26 seconds). As for other operations, they do not benefits from the parallel architecture. In fact, according to MRO people, linear regression and cross product as well as determinant computation and Cholesky decomposition benefits the most from the multithreaded operations. As expected, creation/manipulation/transformation of matrices as well as loops and recursion get no gain. Indeed, these operations are not really parallelism-friendly.

Task Time (seconds)
CRAN R
Time (seconds)
MRO
PCA
[2000x10000]- MRO example-
123.51 19.85
PCA [66x137032] 2.26 2.02
PCA - transposed matrix 2.94 5.56
apply(2, as.numeric) 0.29 0.32
matrix construction (with an if statement) -rbind 1.43 1.64
matrix construction (with an if statement) -assignment 1.03 1.12

Example from MRO benchmark PCA on a 66 x 137032 data matrix

Interestingly, with the profiler, we can see that when constructing a matrix using a for loop, rbind and a condition (for example, having blue if the value is greater than 4 else black), both implementations do not use the same internal methods.

t3