Statistics

Don’t ignore the warnings!

I'm sure that all of you R users have now noticed that sometimes R is talking to you. When you do something wrong, R replies with a message written in red in the console. How many of you actually read those error messages? If you take the time to read them carefully, you'll get a hint about what was wrong in your command. Let's look at an example: > sum(c('1','3','4','4')) Error in sum(c("1", "3", "4", "4")) : invalid 'type' (character) [...]

By | 2016-11-08T09:30:10+00:00 September 3, 2015|Categories: R, Statistics|0 Comments

Kaplan-Meier plot

When working with cancer datasets, one of the goal is sometimes to find features (mutation, clinical information, gene expression, ...) associated to prognosis, i.e. features related to the probable outcome of the disease. If that's one of your goal, you'll have to do a survival analysis.  Survival analysis involves a set of methods to model the time at which an event of interest occurs, that event often being death.  But really, any event for which the time of occurence is [...]

By | 2016-11-08T09:30:14+00:00 February 19, 2015|Categories: Bioinformatics, Statistics|0 Comments

Tweaking Fisher’s exact test for biology

Fisher's exact test is widely applied in bioinformatics (it is the core computation in gene-set or pathway enrichment analysis).  I won't introduce the test itself as others have done it several times (here), but will rather point to a disconnect between what it does and what is often needed. In Fisher's exact test, the null hypothesis is that there is no enrichment between the two variables studied.  When using this test with large numbers (such as the number of genes [...]

By | 2016-11-08T09:30:15+00:00 December 8, 2014|Categories: Bioinformatics, Biology, Statistics|Tags: |0 Comments

Venn diagrams: a visualization nightmare!

I was recently reading a paper (a very inspiring read mapper for RNA-Seq!).  At some point the authors wanted to present the overlap between splice junctions detected by 4 RNA-Seq read mappers and choose to do so using the ubiquitous Venn diagram (see Fig. 1).  I spent a few minutes staring at this colorful mosaic...  without gaining much insight. Fig. 1:  Example of a four-way Venn diagram. Reproduced from figure 4b of Genome Biology, 14(3):R30, 2013. Which mappers overlapped most in [...]

By | 2016-11-08T09:30:16+00:00 October 20, 2014|Categories: Bioinformatics, Data Visualisation, Statistics|0 Comments

Teach me how to box-plot!

Boxplots are everywhere! Publishers like boxplots.  But ask some people and most don't even know what a boxplot represents!  Recently I wanted to examine gene expression data between two samples for a certain gene. The gold standard to look at it would be *drum roll*... A boxplot! Interesting fact #1: Did you know a boxplot is called a “box and whiskers plot” as well?  Let's take a look! A boxplot is easily generated in the analysis software R and its interpretation [...]

By | 2016-11-08T09:30:17+00:00 September 21, 2014|Categories: Data Visualisation, R, Statistics|0 Comments