boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

Understanding how kallisto works

In 2016,  Bray et al. introduced a new k-mer based method to estimate isoform abundance from RNA-Seq data.  Their method, called kallisto, provided a significant improvement in speed and memory usage compared to the previously used methods while yielding similar accuracy.  In fact, kallisto is able to quantify expression in a matter of minutes instead of hours.  Since it is so light and convenient, kallisto is now often used to quantify expression in the form of TPM.   But how does [...]

By | 2018-04-08T15:01:03+00:00 March 28, 2018|Categories: Bioinformatics, Data Analysis|1 Comment

Think like a computer

Let's say all your results for a given project are stored in Excel files named exp1.xlsx, exp2_20170708.xlsx, exp_prolif_072017.xlsx... Inside file exp1.xlsx, you have this : This might be a user-friendly result file but it is not "computer-friendly" file. Let's suppose that you (or your boss) decide that you now need a database instead of the twenty-six different Excel files you have been using to store results. If all your files are similar to exp1.xlsx, you will have to put a [...]

By | 2018-02-08T13:32:14+00:00 February 8, 2018|Categories: Bioinformatics, Biology|1 Comment

A multiprocessing example and more

Recently, I had to search a given chemical structure into a list of structures. Using the python chemoinformatics packages pybel and rdkit, I was easily able to do so but the operation took a little too much time for my linking. Wondering how I could search faster, I immediately thought about Jean-Philippe's previous blog post titled Put Those CPUs to Good Use. I've decided to follow his instructions and give it a try. Goal Look for a molecule (a given [...]

By | 2017-12-11T12:55:55+00:00 December 11, 2017|Categories: Bioinformatics, Computer science, Performance|0 Comments

Big data, big challenge – part 2

This post follows my previous post on big data. Even though the latter did not result in a big virtual discussion, I was pleased to read some comments regarding the situation in other areas of bioinformatics. Proteomics Mathieu Courcelles, bioinformatician at the proteomics platform, explained that mass-spectrometry driven proteomics has always generated 'big data', so this expression is not used in the field. As he said, Mass spectrometers are indeed instruments that generate a large volume of data 24/7. Early on [...]

By | 2017-08-18T13:24:34+00:00 August 18, 2017|Categories: Data Analysis|Tags: , |0 Comments

R or Python, you choose!

I have already briefly introduced pandas, a Python library, by comparing some of its functions to their equivalents in R. Pandas is a library that makes Python almost as convenient as R when doing data visualization and exploration from matrices and data frames (it is built on top of numpy).  It has evolved a lot these past few years as has its community of users. Although pandas is being integrated in a number of specialized packages, such as rdkit for chemoinformatics, [...]

By | 2017-06-26T13:49:42+00:00 June 26, 2017|Categories: Data Analysis, Python, R|Tags: , |0 Comments