boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

Machine learning in life science

Machine learning's popularity is increasing among bioinformaticians and biologists as it gives interesting results and has become more accessible than ever. A machine learning model can now be easily applied on a given dataset using R or Python packages. For example, the Python package Scikit-learn provides several algorithms (Random Forest, Support Vector Machine - SVM -, regression model and much more) and good documentation. Even deep machine learning (neural networks with multiple layers or convolutional networks for example) is more accessible [...]

By | 2016-11-08T09:30:05+00:00 May 18, 2016|Categories: Machine learning|0 Comments

What’s the fastest? – R edition

When I started using R, about ten years ago, the community was much smaller. No R-bloggers to get inspired or ggplot2 to make nice graphs. It was the beginning of an other implementation of R (other than CRAN's) known as Revolution R from Revolution Analytics. Their R targeted enterprise and was designed to be faster and more scalable. They also offer an open source version of their product called RRO. In April 2015, the company was acquired by Microsoft! May [...]

By | 2017-04-29T15:32:29+00:00 February 12, 2016|Categories: Performance, R|0 Comments

Permutations

Say we have the two following groups : g1 <- c(55, 65, 58) g2 <- c(12, 18, 32) We want to see if the two groups belong to the same distribution or can be considered as different groups. We might be tempted to try a Student’s t-test. t.test(g1, g2) ## Welch Two Sample t-test ## ## data: g1 and g2 ## t = 5.8366, df = 2.9412, p-value = 0.01059 ## alternative hypothesis: true difference in means is not equal [...]

By | 2017-04-30T10:15:37+00:00 October 14, 2015|Categories: Data Analysis, R, Statistics|0 Comments

Don’t ignore the warnings!

I'm sure that all of you R users have now noticed that sometimes R is talking to you. When you do something wrong, R replies with a message written in red in the console. How many of you actually read those error messages? If you take the time to read them carefully, you'll get a hint about what was wrong in your command. Let's look at an example: > sum(c('1','3','4','4')) Error in sum(c("1", "3", "4", "4")) : invalid 'type' (character) [...]

By | 2017-04-30T16:25:19+00:00 September 3, 2015|Categories: R, Statistics|0 Comments

Kaplan-Meier plot

When working with cancer datasets, one of the goal is sometimes to find features (mutation, clinical information, gene expression, ...) associated to prognosis, i.e. features related to the probable outcome of the disease. If that's one of your goal, you'll have to do a survival analysis.  Survival analysis involves a set of methods to model the time at which an event of interest occurs, that event often being death.  But really, any event for which the time of occurence is [...]

By | 2017-04-29T17:14:26+00:00 February 19, 2015|Categories: Data Analysis, Statistics|Tags: |0 Comments