About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

R or Python, you choose!

I have already briefly introduced pandas, a Python library, by comparing some of its functions to their equivalents in R. Pandas is a library that makes Python almost as convenient as R when doing data visualization and exploration from matrices and data frames (it is built on top of numpy).  It has evolved a lot these past few years as has its community of users. Although pandas is being integrated in a number of specialized packages, such as rdkit for chemoinformatics, [...]

By | 2017-06-26T13:49:42+00:00 June 26, 2017|Categories: Data Analysis, Python, R|Tags: , |0 Comments

Big data, big challenge

You've probably heard the expression "Big Data" before. Particularly, if you read Simon Mathien's blog post on IRIC's website. (If you haven't read it yet, you should do it now!). There exist several definitions (or interpretations) of this expression, which is best summarized by the following two : Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data Oxford English Dictionary Domaine technologique dédié [...]

By | 2017-05-02T21:05:43+00:00 April 24, 2017|Categories: Data Analysis|Tags: , , |3 Comments

Logistic regression and GTEx

Working with all sorts of data, it happens sometimes that we want to predict the value of a variable which is not numerical. For those cases, a logistic regression is appropriate. It is similar to a linear regression except that it deals with the fact that the dependent variable is categorical. Here is the formula for the linear regression, where we want to estimate the parameters beta (coefficients) that fit best our data : \begin{equation} Y_i = \beta_0 + \beta_1 X_i [...]

By | 2017-04-29T17:44:14+00:00 January 27, 2017|Categories: Biology, Data Analysis, Python|Tags: , , |0 Comments

Pivoting tables : from long to wide

As bioinformaticians, we often have to work with data that are not formatted the way we would need them to be. One case we might encounter is receiving data in a "long" format instead of receiving them in a more familiar "wide" format. For those of you familiar with the ggplot R package, you know this format very well. It's the format required by ggplot to produce its nice graphs.   Long genes samples expression 1 BAD S01 7.525395 2 [...]

By | 2017-04-29T18:11:56+00:00 November 14, 2016|Categories: Data Analysis, Python, R|Tags: |0 Comments

Good resources to learn R

Since it's the summer vacations, why not take some time to learn R. There are numerous free resources online to dive into this powerful language. For whomever wants to learn it, the challenge more related to finding the time rather than finding resources. Videos Coursera is an inevitable for online learning. There are a few good video courses offered for R beginners that are more or less oriented toward genomics : (Bioconductor is a life science packages [...]

By | 2017-04-29T16:57:17+00:00 July 11, 2016|Categories: Bioinformatics, R|Tags: |0 Comments