R

R programming language

Fastest method to compute an AUC

Context: AUC is an acronym for "Area Under the (ROC) Curve". If you are not familiar with the ROC curve and AUC, I suggest reading this blog post before to continuing further. For several projects, I needed to compute a large number of AUC. It started with 25,000, increased to 230,000 and now I need to compute 1,500,000 AUC. With so many AUC, the time to compute each one becomes critical. On the web, I don't find much information about this specific [...]

By | 2016-11-08T09:30:03+00:00 August 18, 2016|Categories: Data Analysis, Performance, Python, R, Statistics|0 Comments

Good resources to learn R

Since it's the summer vacations, why not take some time to learn R. There are numerous free resources online to dive into this powerful language. For whomever wants to learn it, the challenge more related to finding the time rather than finding resources. Videos Coursera is an inevitable for online learning. There are a few good video courses offered for R beginners that are more or less oriented toward genomics : https://www.coursera.org/learn/r-programming https://www.coursera.org/learn/exploratory-data-analysis https://www.coursera.org/learn/bioconductor (Bioconductor is a life science packages [...]

By | 2016-11-08T09:30:04+00:00 July 11, 2016|Categories: Bioinformatics, R|0 Comments

Standard deviation on a correlation scatter plot

I was recently asked by a colleague to provide visualization of differential gene expression computed using RPKM values (two samples, no replicates) and highlight genes that were outside the distribution by 2 standard deviations or more. As a first draft, I quickly obliged by calculating the fold change distribution, computing standard deviation and drawing lines on either side of the diagonal to obtain: This turns out to be equivalent to computing the standard deviation of the residual of a linear [...]

By | 2016-11-08T09:30:06+00:00 April 5, 2016|Categories: Data Analysis, Data Visualization, R, Statistics|0 Comments

Simple multiprocessing in R

Continuing my effort to help you get the most out of your CPUs, I figured we could look into using some multiprocessing functionality available for your R scripts. While there are a few different options for running multi-core treatments on your data, we'll focus on something really simple to put in place. A while back, I was putting together a script to run a large series of logistic regressions (using the glm package) in an attempt to model some data. [...]

By | 2016-03-14T15:40:03+00:00 March 14, 2016|Categories: Performance, R|Tags: |0 Comments

What’s the fastest? – R edition

When I started using R, about ten years ago, the community was much smaller. No R-bloggers to get inspired or ggplot2 to make nice graphs. It was the beginning of an other implementation of R (other than CRAN's) known as Revolution R from Revolution Analytics. Their R targeted enterprise and was designed to be faster and more scalable. They also offer an open source version of their product called RRO. In April 2015, the company was acquired by Microsoft! May [...]

By | 2016-11-08T09:30:07+00:00 February 12, 2016|Categories: R|0 Comments