boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

Best practices in data visualization

Sébastien's last post presented a hard-to-understand graph. The Venn diagram with four sets is a good example of visualization gone wrong. Good practices in data visualization is a hot topic right now. Not just in science, but in multiple areas such as journalism and business intelligence. Indeed, the crowd was quite heterogeneous at the first Visualisation Montréal meeting in August where more than 100 persons showed up! And the free ebook that was launched at the meeting targets beginners from all fields. [...]

By | 2017-04-29T15:40:51+00:00 October 31, 2014|Categories: Data Visualization|0 Comments

Gene symbols : the challenge

Almost certainly, one day, you'll have between your hands a list of outdated gene symbols. And you'll probably think that updating them is a straightforward task, but it's not that simple! Because there's the word 'bio' in bioinformatician, updating the gene symbols reminds me of the futile cycle. According to Wikipedia's definition, a futile cycle occurs when two metabolic pathways run simultaneously in opposite directions and have no overall effect other than to dissipate energy in the form of heat**.  Updating the [...]

By | 2016-11-08T09:30:17+00:00 September 29, 2014|Categories: Bioinformatics, Biology|0 Comments

RStudio and version control

A version control is just a way to keep track of changes made to files throughout time.  It allows you to return to previous versions later.  I bet you are already using one without even knowing it! When you copy a file or a script before modifying it, you're using version control.  However, your manual version control may become hard to deal with at some point.  That's why it's worth investing time early on in a project and use a [...]

By | 2017-05-01T10:21:46+00:00 June 10, 2014|Categories: R|Tags: , , |0 Comments

python and pandas

R is undeniably a must-use language. Especially for data visualization. But R can sometimes be a little bit slow when dealing with big datasets. If you don't need to create awesome graphs or don't have time to wait, there's an alternative in Python that can be quite fast for data manipulation. The Python Data Analysis Library, pandas, provides an easy way to manipulate data in python. Recently, I had to deal with a big gene expression file (21024 genes x [...]

By | 2017-04-29T15:49:18+00:00 April 17, 2014|Categories: Data Analysis, Python|Tags: , |1 Comment

What’s the fastest?

Often, we rely on our old habits. We get comfortable and have a tendency to do things the same old way. Same thing happens when you're programming. But a day will come when you’ll ask yourself, is this the fastest way to perform this task ? And when this happens to you (and if the given task is in Python), you’ll be glad that a package like timeit exist. Sure there are other ways to organize timing contest in Python. [...]

By | 2017-05-01T10:25:40+00:00 April 2, 2014|Categories: Performance, Python|0 Comments