boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

Big data, big challenge

You've probably heard the expression "Big Data" before. Particularly, if you read Simon Mathien's blog post on IRIC's website. (If you haven't read it yet, you should do it now!). There exist several definitions (or interpretations) of this expression, which is best summarized by the following two : Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data Oxford English Dictionary Domaine technologique dédié [...]

By | 2017-05-02T21:05:43+00:00 April 24, 2017|Categories: Data Analysis|Tags: , , |3 Comments

Logistic regression and GTEx

Working with all sorts of data, it happens sometimes that we want to predict the value of a variable which is not numerical. For those cases, a logistic regression is appropriate. It is similar to a linear regression except that it deals with the fact that the dependent variable is categorical. Here is the formula for the linear regression, where we want to estimate the parameters beta (coefficients) that fit best our data : \begin{equation} Y_i = \beta_0 + \beta_1 X_i [...]

By | 2017-04-29T17:44:14+00:00 January 27, 2017|Categories: Biology, Data Analysis, Python|Tags: , , |0 Comments

Pivoting tables : from long to wide

As bioinformaticians, we often have to work with data that are not formatted the way we would need them to be. One case we might encounter is receiving data in a "long" format instead of receiving them in a more familiar "wide" format. For those of you familiar with the ggplot R package, you know this format very well. It's the format required by ggplot to produce its nice graphs.   Long genes samples expression 1 BAD S01 7.525395 2 [...]

By | 2017-04-29T18:11:56+00:00 November 14, 2016|Categories: Data Analysis, Python, R|Tags: |0 Comments

Good resources to learn R

Since it's the summer vacations, why not take some time to learn R. There are numerous free resources online to dive into this powerful language. For whomever wants to learn it, the challenge more related to finding the time rather than finding resources. Videos Coursera is an inevitable for online learning. There are a few good video courses offered for R beginners that are more or less oriented toward genomics : https://www.coursera.org/learn/r-programming https://www.coursera.org/learn/exploratory-data-analysis https://www.coursera.org/learn/bioconductor (Bioconductor is a life science packages [...]

By | 2017-04-29T16:57:17+00:00 July 11, 2016|Categories: Bioinformatics, R|Tags: |0 Comments

Machine learning in life science

Machine learning's popularity is increasing among bioinformaticians and biologists as it gives interesting results and has become more accessible than ever. A machine learning model can now be easily applied on a given dataset using R or Python packages. For example, the Python package Scikit-learn provides several algorithms (Random Forest, Support Vector Machine - SVM -, regression model and much more) and good documentation. Even deep machine learning (neural networks with multiple layers or convolutional networks for example) is more accessible [...]

By | 2016-11-08T09:30:05+00:00 May 18, 2016|Categories: Machine learning|0 Comments