Bioinformatics

Kaplan-Meier plot

When working with cancer datasets, one of the goal is sometimes to find features (mutation, clinical information, gene expression, ...) associated to prognosis, i.e. features related to the probable outcome of the disease. If that's one of your goal, you'll have to do a survival analysis.  Survival analysis involves a set of methods to model the time at which an event of interest occurs, that event often being death.  But really, any event for which the time of occurence is [...]

By | February 19, 2015|Categories: Bioinformatics, Statistics|0 Comments

Table-reading: loading data into R without a hassle

The first thing I have learned in R is how to load a table. Usually, when you start your R journey, someone more knowledgeable will tell you how to do this very first action. It will typically be: data<-read.table("~/SomeFolder/datafile.txt") You probably will be adding various parameters into the brackets such as "row.names=0" or "header=TRUE" or, "sep="\t"", to make sure you are reading your file correctly. And this is perfectly fine, as a loading method of small datasets. However, to maximize [...]

By | February 5, 2015|Categories: Bioinformatics, Biology, Performance, R|1 Comment

One task, three ways

Usually, there is more than one way to accomplish a task. Some are better, some are worse and others are just as good. Assessing which one to use is often related to the computing time, the ease of use and/or to personal preferences and abilities. Say I have a matrix of thousands of chromosomal features with the following column names : Feature, Start, End. All the positions are found on the same chromosome and the widths of my features are variable. [...]

By | January 15, 2015|Categories: Bioinformatics, R|0 Comments

Tweaking Fisher’s exact test for biology

Fisher's exact test is widely applied in bioinformatics (it is the core computation in gene-set or pathway enrichment analysis).  I won't introduce the test itself as others have done it several times (here), but will rather point to a disconnect between what it does and what is often needed. In Fisher's exact test, the null hypothesis is that there is no enrichment between the two variables studied.  When using this test with large numbers (such as the number of genes [...]

By | December 8, 2014|Categories: Bioinformatics, Biology, Statistics|Tags: |0 Comments

Venn diagrams: a visualization nightmare!

I was recently reading a paper (a very inspiring read mapper for RNA-Seq!).  At some point the authors wanted to present the overlap between splice junctions detected by 4 RNA-Seq read mappers and choose to do so using the ubiquitous Venn diagram (see Fig. 1).  I spent a few minutes staring at this colorful mosaic...  without gaining much insight. Fig. 1:  Example of a four-way Venn diagram. Reproduced from figure 4b of Genome Biology, 14(3):R30, 2013. Which mappers overlapped most in [...]

By | October 20, 2014|Categories: Bioinformatics, Data Visualisation, Statistics|0 Comments