Document your work by adding parameters to your shell scripts

At some point during your bioinformatics career, you're going to start writing shell scripts, it's kind of inevitable ! So let us discuss a strategy to add parameters to your scripts in order make them more easily reusable, while also keeping a trace of the settings you used to generate a set of results. (Disclaimer: the procedures described below have been tested using BASH, some modifications might be necessary if you use a different shell.) Modifying your scripts in order [...]

By | 2018-08-14T14:29:57+00:00 August 14, 2018|Categories: Bioinformatics, Shell scripting|0 Comments

A word about unwanted variability

Experiments are influenced by various variables: the one we are interested in, and many others. Variability in the data can be related to differences in technical or biological variables, such as the instrument used, genetic background, age, gender, etc.  Therefore, batch effects are frequently observed in gene expression datasets.   They can affect genes independently of the variable of interest (independent of cancer or normal states, for example) or not (the expression of a given gene might be influenced by the [...]

By | 2018-07-25T08:26:13+00:00 July 24, 2018|Categories: Bioinformatics|1 Comment

Understanding how kallisto works

In 2016,  Bray et al. introduced a new k-mer based method to estimate isoform abundance from RNA-Seq data.  Their method, called kallisto, provided a significant improvement in speed and memory usage compared to the previously used methods while yielding similar accuracy.  In fact, kallisto is able to quantify expression in a matter of minutes instead of hours.  Since it is so light and convenient, kallisto is now often used to quantify expression in the form of TPM.   But how does [...]

By | 2018-04-08T15:01:03+00:00 March 28, 2018|Categories: Bioinformatics, Data Analysis|1 Comment

Think like a computer

Let's say all your results for a given project are stored in Excel files named exp1.xlsx, exp2_20170708.xlsx, exp_prolif_072017.xlsx... Inside file exp1.xlsx, you have this : This might be a user-friendly result file but it is not "computer-friendly" file. Let's suppose that you (or your boss) decide that you now need a database instead of the twenty-six different Excel files you have been using to store results. If all your files are similar to exp1.xlsx, you will have to put a [...]

By | 2018-02-08T13:32:14+00:00 February 8, 2018|Categories: Bioinformatics, Biology|1 Comment

A multiprocessing example and more

Recently, I had to search a given chemical structure into a list of structures. Using the python chemoinformatics packages pybel and rdkit, I was easily able to do so but the operation took a little too much time for my linking. Wondering how I could search faster, I immediately thought about Jean-Philippe's previous blog post titled Put Those CPUs to Good Use. I've decided to follow his instructions and give it a try. Goal Look for a molecule (a given [...]

By | 2017-12-11T12:55:55+00:00 December 11, 2017|Categories: Bioinformatics, Computer science, Performance|0 Comments