About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

A word about unwanted variability

Experiments are influenced by various variables: the one we are interested in, and many others. Variability in the data can be related to differences in technical or biological variables, such as the instrument used, genetic background, age, gender, etc.  Therefore, batch effects are frequently observed in gene expression datasets.   They can affect genes independently of the variable of interest (independent of cancer or normal states, for example) or not (the expression of a given gene might be influenced by the [...]

By | 2018-07-25T08:26:13+00:00 July 24, 2018|Categories: Bioinformatics|1 Comment

Understanding how kallisto works

In 2016,  Bray et al. introduced a new k-mer based method to estimate isoform abundance from RNA-Seq data.  Their method, called kallisto, provided a significant improvement in speed and memory usage compared to the previously used methods while yielding similar accuracy.  In fact, kallisto is able to quantify expression in a matter of minutes instead of hours.  Since it is so light and convenient, kallisto is now often used to quantify expression in the form of TPM.   But how does [...]

By | 2018-04-08T15:01:03+00:00 March 28, 2018|Categories: Bioinformatics, Data Analysis|1 Comment

Think like a computer

Let's say all your results for a given project are stored in Excel files named exp1.xlsx, exp2_20170708.xlsx, exp_prolif_072017.xlsx... Inside file exp1.xlsx, you have this : This might be a user-friendly result file but it is not "computer-friendly" file. Let's suppose that you (or your boss) decide that you now need a database instead of the twenty-six different Excel files you have been using to store results. If all your files are similar to exp1.xlsx, you will have to put a [...]

By | 2018-02-08T13:32:14+00:00 February 8, 2018|Categories: Bioinformatics, Biology|1 Comment

A multiprocessing example and more

Recently, I had to search a given chemical structure into a list of structures. Using the python chemoinformatics packages pybel and rdkit, I was easily able to do so but the operation took a little too much time for my linking. Wondering how I could search faster, I immediately thought about Jean-Philippe's previous blog post titled Put Those CPUs to Good Use. I've decided to follow his instructions and give it a try. Goal Look for a molecule (a given [...]

By | 2017-12-11T12:55:55+00:00 December 11, 2017|Categories: Bioinformatics, Computer science, Performance|0 Comments

Big data, big challenge – part 2

This post follows my previous post on big data. Even though the latter did not result in a big virtual discussion, I was pleased to read some comments regarding the situation in other areas of bioinformatics. Proteomics Mathieu Courcelles, bioinformatician at the proteomics platform, explained that mass-spectrometry driven proteomics has always generated 'big data', so this expression is not used in the field. As he said, Mass spectrometers are indeed instruments that generate a large volume of data 24/7. Early on [...]

By | 2017-08-18T13:24:34+00:00 August 18, 2017|Categories: Data Analysis|Tags: , |0 Comments