Bioinformatics

Table-reading: loading data into R without a hassle

The first thing I have learned in R is how to load a table. Usually, when you start your R journey, someone more knowledgeable will tell you how to do this very first action. It will typically be: data<-read.table("~/SomeFolder/datafile.txt") You probably will be adding various parameters into the brackets such as "row.names=0" or "header=TRUE" or, "sep="\t"", to make sure you are reading your file correctly. And this is perfectly fine, as a loading method of small datasets. However, to maximize [...]

By | 2017-04-29T17:14:58+00:00 February 5, 2015|Categories: Bioinformatics, R|Tags: |1 Comment

One task, three ways

Usually, there is more than one way to accomplish a task. Some are better, some are worse and others are just as good. Assessing which one to use is often related to the computing time, the ease of use and/or to personal preferences and abilities. Say I have a matrix of thousands of chromosomal features with the following column names : Feature, Start, End. All the positions are found on the same chromosome and the widths of my features are variable. [...]

By | 2017-05-01T10:25:02+00:00 January 15, 2015|Categories: Bioinformatics, R|0 Comments

Tweaking Fisher’s exact test for biology

Fisher's exact test is widely applied in bioinformatics (it is the core computation in gene-set or pathway enrichment analysis).  I won't introduce the test itself as others have done it several times (here), but will rather point to a disconnect between what it does and what is often needed. In Fisher's exact test, the null hypothesis is that there is no enrichment between the two variables studied.  When using this test with large numbers (such as the number of genes [...]

By | 2017-05-01T10:33:14+00:00 December 8, 2014|Categories: Bioinformatics, Biology, Statistics|Tags: |0 Comments

Gene symbols : the challenge

Almost certainly, one day, you'll have between your hands a list of outdated gene symbols. And you'll probably think that updating them is a straightforward task, but it's not that simple! Because there's the word 'bio' in bioinformatician, updating the gene symbols reminds me of the futile cycle. According to Wikipedia's definition, a futile cycle occurs when two metabolic pathways run simultaneously in opposite directions and have no overall effect other than to dissipate energy in the form of heat**.  Updating the [...]

By | 2016-11-08T09:30:17+00:00 September 29, 2014|Categories: Bioinformatics, Biology|0 Comments

Assessing enrichment

Working on a set of RNA-seq of AML patient samples, I stumble on gene X.  When its expression is high, 50% of the samples are mutated on gene Y, a mutation that has a prevalence of only 20% in the rest of the dataset.  Is there a link between these two observations?  Let's put some numbers on this:  among the 131 samples of the dataset, 28 show mutations on gene Y, 6 have high expression of X and 3 have both "features".  The table below is [...]

By | 2017-04-29T15:49:00+00:00 May 21, 2014|Categories: Bioinformatics, Statistics|0 Comments