Statistics

Kaplan-Meier plot

When working with cancer datasets, one of the goal is sometimes to find features (mutation, clinical information, gene expression, ...) associated to prognosis, i.e. features related to the probable outcome of the disease. If that's one of your goal, you'll have to do a survival analysis.  Survival analysis involves a set of methods to model the time at which an event of interest occurs, that event often being death.  But really, any event for which the time of occurence is [...]

By | 2017-04-29T17:14:26+00:00 February 19, 2015|Categories: Data Analysis, Statistics|Tags: |0 Comments

Tweaking Fisher’s exact test for biology

Fisher's exact test is widely applied in bioinformatics (it is the core computation in gene-set or pathway enrichment analysis).  I won't introduce the test itself as others have done it several times (here), but will rather point to a disconnect between what it does and what is often needed. In Fisher's exact test, the null hypothesis is that there is no enrichment between the two variables studied.  When using this test with large numbers (such as the number of genes [...]

By | 2017-05-01T10:33:14+00:00 December 8, 2014|Categories: Bioinformatics, Biology, Statistics|Tags: |0 Comments

Venn diagrams: a visualization nightmare!

I was recently reading a paper (a very inspiring read mapper for RNA-Seq!).  At some point the authors wanted to present the overlap between splice junctions detected by 4 RNA-Seq read mappers and choose to do so using the ubiquitous Venn diagram (see Fig. 1).  I spent a few minutes staring at this colorful mosaic...  without gaining much insight. Fig. 1:  Example of a four-way Venn diagram. Reproduced from figure 4b of Genome Biology, 14(3):R30, 2013. Which mappers overlapped most in [...]

By | 2017-04-29T17:20:46+00:00 October 20, 2014|Categories: Data Visualization, Statistics|0 Comments

Teach me how to box-plot!

Boxplots are everywhere! Publishers like boxplots.  But ask some people and most don't even know what a boxplot represents!  Recently I wanted to examine gene expression data between two samples for a certain gene. The gold standard to look at it would be *drum roll*... A boxplot! Interesting fact #1: Did you know a boxplot is called a “box and whiskers plot” as well?  Let's take a look! A boxplot is easily generated in the analysis software R and its interpretation [...]

By | 2017-04-29T15:41:25+00:00 September 21, 2014|Categories: Data Visualization, R, Statistics|0 Comments

Is a p-value needed?

Much has been written on the need for statistics in genome-scale molecular biology.  Very clever analytical approaches were devised, taking the form of carefully crafted and freely downloadable software packages.  But still, every month or so, I meet with students and researchers facing a similar dilemma:  they need to decide whether to report the strength of an effect (eg. gene X is over-expressed by 4.5-fold in condition A vs. B) or the significance of such effect (eg. gene X is overexpressed [...]

By | 2016-11-08T09:30:17+00:00 September 12, 2014|Categories: Statistics|2 Comments