boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

A multiprocessing example and more

Recently, I had to search a given chemical structure into a list of structures. Using the python chemoinformatics packages pybel and rdkit, I was easily able to do so but the operation took a little too much time for my linking. Wondering how I could search faster, I immediately thought about Jean-Philippe's previous blog post titled Put Those CPUs to Good Use. I've decided to follow his instructions and give it a try. Goal Look for a molecule (a given [...]

By | 2017-12-11T12:55:55+00:00 December 11, 2017|Categories: Bioinformatics, Computer science, Performance|0 Comments

Big data, big challenge – part 2

This post follows my previous post on big data. Even though the latter did not result in a big virtual discussion, I was pleased to read some comments regarding the situation in other areas of bioinformatics. Proteomics Mathieu Courcelles, bioinformatician at the proteomics platform, explained that mass-spectrometry driven proteomics has always generated 'big data', so this expression is not used in the field. As he said, Mass spectrometers are indeed instruments that generate a large volume of data 24/7. Early on [...]

By | 2017-08-18T13:24:34+00:00 August 18, 2017|Categories: Data Analysis|Tags: , |0 Comments

R or Python, you choose!

I have already briefly introduced pandas, a Python library, by comparing some of its functions to their equivalents in R. Pandas is a library that makes Python almost as convenient as R when doing data visualization and exploration from matrices and data frames (it is built on top of numpy).  It has evolved a lot these past few years as has its community of users. Although pandas is being integrated in a number of specialized packages, such as rdkit for chemoinformatics, [...]

By | 2017-06-26T13:49:42+00:00 June 26, 2017|Categories: Data Analysis, Python, R|Tags: , |0 Comments

Big data, big challenge

You've probably heard the expression "Big Data" before. Particularly, if you read Simon Mathien's blog post on IRIC's website. (If you haven't read it yet, you should do it now!). There exist several definitions (or interpretations) of this expression, which is best summarized by the following two : Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data Oxford English Dictionary Domaine technologique dédié [...]

By | 2017-05-02T21:05:43+00:00 April 24, 2017|Categories: Data Analysis|Tags: , , |3 Comments

Logistic regression and GTEx

Working with all sorts of data, it happens sometimes that we want to predict the value of a variable which is not numerical. For those cases, a logistic regression is appropriate. It is similar to a linear regression except that it deals with the fact that the dependent variable is categorical. Here is the formula for the linear regression, where we want to estimate the parameters beta (coefficients) that fit best our data : \begin{equation} Y_i = \beta_0 + \beta_1 X_i [...]

By | 2017-04-29T17:44:14+00:00 January 27, 2017|Categories: Biology, Data Analysis, Python|Tags: , , |0 Comments