When I started using R, about ten years ago, the community was much smaller. No R-bloggers to get inspired or ggplot2 to make nice graphs. It was the beginning of an other implementation of R (other than CRAN’s) known as Revolution R from Revolution Analytics. Their R targeted enterprise and was designed to be faster and more scalable. They also offer an open source version of their product called RRO.
In April 2015, the company was acquired by Microsoft! May be this means, that at some point, everybody will be able to do R in Excel (instead of using Visual Basics)… We’ll see!
Meanwhile, I decided to try RRO, now MRO (Microsoft R Open) and see if I could get higher performance on the day-to-day analyses I do. A benchmark is available (results and code) on their website to show exactly what you can gain from their multithreaded implementation. I decided to test it anyway even if those operations were not the ones I used most frequently.
So I set up my own little benchmark. I did so using an R Markdown file to have the results nicely put in an HTML document (other possible outputs are pdf and Word formats). I ran my benchmark using MRO 3.2.3 and CRAN’s R 3.2.2. I ran it on my desktop computer with 4 cores. The html outputs can be found here.
Remember the post I did about python and how to know which piece of code is the fastest? We can do the same in R!
There are two simple functions to measure time in R :
system.time(). You can use them like this :
m <- 10000 n <- 2000 A <- matrix (runif (m*n),m,n) system.time (P <- prcomp(A)) ## or ptm <- proc.time() P <- prcomp(A) proc.time() - ptm # user system elapsed # 122.594 1.122 123.583 #The ‘user time’ is the CPU time charged for the execution of user instructions of the calling process. #The ‘system time’ is the CPU time charged for execution by the system on behalf of the calling process.
However, sometimes, you would like to get a more detailed look at the steps to see which one takes the most time. That is where the package
timeit comes in. The usage is similar to
timeit is, what we call, a profiler so the output is different :
t0 <- timeit (P <- prcomp(A), replications=1, times=1) # self.time self.pct total.time total.pct mem.total replications iteration # "La.svd" 98.34 78.994296731 98.65 28.320032152 488.4 1 1 # "%*%" 25.27 20.298819182 25.27 7.254406614 152.6 1 1 # "matrix" 0.23 0.184753795 0.23 0.066027444 183.1 1 1 # "array" 0.16 0.128524379 0.16 0.045932135 152.6 1 1 # "aperm.default" 0.14 0.112458832 0.14 0.040190618 152.6 1 1 # "is.finite" 0.11 0.088360511 0.11 0.031578343 152.6 1 1 # "colMeans" 0.08 0.064262190 0.08 0.022966068 0.0 1 1 # "svd" 0.05 0.040163869 98.81 28.365964288 671.5 1 1 # "any" 0.04 0.032131095 0.04 0.011483034 0.0 1 1 # "t.default" 0.03 0.024098321 0.03 0.008612275 30.5 1 1 # "sweep" 0.02 0.016065547 0.32 0.091864271 305.2 1 1 # "prcomp.default" 0.01 0.008032774 124.49 35.738071999 1159.9 1 1 # "as.matrix" 0.01 0.008032774 0.01 0.002870758 0.0 1 1
The first two columns are the ones we want to look at,
self.pct. They are respectively the time in seconds and the percentage of total time spent executing the code of the stated function.
In my output, you can see all the steps that are done in order to do a principal component analysis and the time and memory used by each step. For quick operations, you can set a number of replications and iterations in order to get a reasonable idea. In the output above, you can see that
prcomp uses the
as.matrix functions and that they are quite fast while
La.svd are the most costly steps. To get the total amount of time, we can sum the
Here are some the results for the comparison of R and MRO. Taking the example provided on Revolution website, MRO is clearly the fastest at doing PCA (27.8 vs 129.3 seconds) on a 2000 x 10,000 matrix but it's less clear when I use my own example of 66 x 137,032 (2.02 vs 2.26 seconds). As for other operations, they do not benefits from the parallel architecture. In fact, according to MRO people, linear regression and cross product as well as determinant computation and Cholesky decomposition benefits the most from the multithreaded operations. As expected, creation/manipulation/transformation of matrices as well as loops and recursion get no gain. Indeed, these operations are not really parallelism-friendly.
[2000x10000]- MRO example-
|PCA - transposed matrix||2.94||5.56|
|matrix construction (with an if statement) -rbind||1.43||1.64|
|matrix construction (with an if statement) -assignment||1.03||1.12|
Interestingly, with the profiler, we can see that when constructing a matrix using a
rbind and a condition (for example, having blue if the value is greater than 4 else black), both implementations do not use the same internal methods.