boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

One task, three ways

Usually, there is more than one way to accomplish a task. Some are better, some are worse and others are just as good. Assessing which one to use is often related to the computing time, the ease of use and/or to personal preferences and abilities. Say I have a matrix of thousands of chromosomal features with the following column names : Feature, Start, End. All the positions are found on the same chromosome and the widths of my features are variable. [...]

By | 2017-05-01T10:25:02+00:00 January 15, 2015|Categories: Bioinformatics, R|0 Comments

Best practices in data visualization

Sébastien's last post presented a hard-to-understand graph. The Venn diagram with four sets is a good example of visualization gone wrong. Good practices in data visualization is a hot topic right now. Not just in science, but in multiple areas such as journalism and business intelligence. Indeed, the crowd was quite heterogeneous at the first Visualisation Montréal meeting in August where more than 100 persons showed up! And the free ebook that was launched at the meeting targets beginners from all fields. [...]

By | 2017-04-29T15:40:51+00:00 October 31, 2014|Categories: Data Visualization|0 Comments

Gene symbols : the challenge

Almost certainly, one day, you'll have between your hands a list of outdated gene symbols. And you'll probably think that updating them is a straightforward task, but it's not that simple! Because there's the word 'bio' in bioinformatician, updating the gene symbols reminds me of the futile cycle. According to Wikipedia's definition, a futile cycle occurs when two metabolic pathways run simultaneously in opposite directions and have no overall effect other than to dissipate energy in the form of heat**.  Updating the [...]

By | 2016-11-08T09:30:17+00:00 September 29, 2014|Categories: Bioinformatics, Biology|0 Comments

RStudio and version control

A version control is just a way to keep track of changes made to files throughout time.  It allows you to return to previous versions later.  I bet you are already using one without even knowing it! When you copy a file or a script before modifying it, you're using version control.  However, your manual version control may become hard to deal with at some point.  That's why it's worth investing time early on in a project and use a [...]

By | 2017-05-01T10:21:46+00:00 June 10, 2014|Categories: R|Tags: , , |0 Comments

python and pandas

R is undeniably a must-use language. Especially for data visualization. But R can sometimes be a little bit slow when dealing with big datasets. If you don't need to create awesome graphs or don't have time to wait, there's an alternative in Python that can be quite fast for data manipulation. The Python Data Analysis Library, pandas, provides an easy way to manipulate data in python. Recently, I had to deal with a big gene expression file (21024 genes x [...]

By | 2017-04-29T15:49:18+00:00 April 17, 2014|Categories: Data Analysis, Python|Tags: , |1 Comment