Is a p-value needed?

Much has been written on the need for statistics in genome-scale molecular biology.  Very clever analytical approaches were devised, taking the form of carefully crafted and freely downloadable software packages.  But still, every month or so, I meet with students and researchers facing a similar dilemma:  they need to decide whether to report the strength of an effect (eg. gene X is over-expressed by 4.5-fold in condition A vs. B) or the significance of such effect (eg. gene X is overexpressed [...]

By |2016-11-08T09:30:17+00:00September 12, 2014|Categories: Statistics|2 Comments

RStudio and version control

A version control is just a way to keep track of changes made to files throughout time.  It allows you to return to previous versions later.  I bet you are already using one without even knowing it! When you copy a file or a script before modifying it, you're using version control.  However, your manual version control may become hard to deal with at some point.  That's why it's worth investing time early on in a project and use a [...]

By |2017-05-01T10:21:46+00:00June 10, 2014|Categories: R|Tags: , , |0 Comments

Assessing enrichment

Working on a set of RNA-seq of AML patient samples, I stumble on gene X.  When its expression is high, 50% of the samples are mutated on gene Y, a mutation that has a prevalence of only 20% in the rest of the dataset.  Is there a link between these two observations?  Let's put some numbers on this:  among the 131 samples of the dataset, 28 show mutations on gene Y, 6 have high expression of X and 3 have both "features".  The table below is [...]

By |2017-04-29T15:49:00+00:00May 21, 2014|Categories: Bioinformatics, Statistics|0 Comments

python and pandas

R is undeniably a must-use language. Especially for data visualization. But R can sometimes be a little bit slow when dealing with big datasets. If you don't need to create awesome graphs or don't have time to wait, there's an alternative in Python that can be quite fast for data manipulation. The Python Data Analysis Library, pandas, provides an easy way to manipulate data in python. Recently, I had to deal with a big gene expression file (21024 genes x [...]

By |2017-04-29T15:49:18+00:00April 17, 2014|Categories: Data Analysis, Python|Tags: , |1 Comment

What’s the fastest?

Often, we rely on our old habits. We get comfortable and have a tendency to do things the same old way. Same thing happens when you're programming. But a day will come when you’ll ask yourself, is this the fastest way to perform this task ? And when this happens to you (and if the given task is in Python), you’ll be glad that a package like timeit exist. Sure there are other ways to organize timing contest in Python. [...]

By |2017-05-01T10:25:40+00:00April 2, 2014|Categories: Performance, Python|0 Comments
Go to Top