About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

Logistic regression and GTEx

Working with all sorts of data, it happens sometimes that we want to predict the value of a variable which is not numerical. For those cases, a logistic regression is appropriate. It is similar to a linear regression except that it deals with the fact that the dependent variable is categorical. Here is the formula for the linear regression, where we want to estimate the parameters beta (coefficients) that fit best our data : \begin{equation} Y_i = \beta_0 + \beta_1 X_i [...]

By | January 27, 2017|Categories: Bioinformatics, Data Analysis, Python|0 Comments

Pivoting tables : from long to wide

As bioinformaticians, we often have to work with data that are not formatted the way we would need them to be. One case we might encounter is receiving data in a "long" format instead of receiving them in a more familiar "wide" format. For those of you familiar with the ggplot R package, you know this format very well. It's the format required by ggplot to produce its nice graphs. Long genes samples expression 1 BAD S01 7.525395 2 BAD [...]

By | November 14, 2016|Categories: Python, R|0 Comments

Good resources to learn R

Since it's the summer vacations, why not take some time to learn R. There are numerous free resources online to dive into this powerful language. For whomever wants to learn it, the challenge more related to finding the time rather than finding resources. Videos Coursera is an inevitable for online learning. There are a few good video courses offered for R beginners that are more or less oriented toward genomics : (Bioconductor is a life science packages [...]

By | July 11, 2016|Categories: Bioinformatics, R|0 Comments

Machine learning in life science

Machine learning's popularity is increasing among bioinformaticians and biologists as it gives interesting results and has become more accessible than ever. A machine learning model can now be easily applied on a given dataset using R or Python packages. For example, the Python package Scikit-learn provides several algorithms (Random Forest, Support Vector Machine - SVM -, regression model and much more) and good documentation. Even deep machine learning (neural networks with multiple layers or convolutional networks for example) is more accessible [...]

By | May 18, 2016|Categories: Machine learning|0 Comments

What’s the fastest? – R edition

When I started using R, about ten years ago, the community was much smaller. No R-bloggers to get inspired or ggplot2 to make nice graphs. It was the beginning of an other implementation of R (other than CRAN's) known as Revolution R from Revolution Analytics. Their R targeted enterprise and was designed to be faster and more scalable. They also offer an open source version of their product called RRO. In April 2015, the company was acquired by Microsoft! May [...]

By | February 12, 2016|Categories: R|0 Comments