boucherg

About Geneviève

I’ve started in biochemistry but it is as a bioinformatician that I’ve been having fun for several years now : whether doing data analysis and visualization in R, building interactive web interfaces in javascript or exploring machine learning in python.

What’s the fastest? – R edition

When I started using R, about ten years ago, the community was much smaller. No R-bloggers to get inspired or ggplot2 to make nice graphs. It was the beginning of an other implementation of R (other than CRAN's) known as Revolution R from Revolution Analytics. Their R targeted enterprise and was designed to be faster and more scalable. They also offer an open source version of their product called RRO. In April 2015, the company was acquired by Microsoft! May [...]

By | 2017-04-29T15:32:29+00:00 February 12, 2016|Categories: Performance, R|0 Comments

Permutations

Say we have the two following groups : g1 <- c(55, 65, 58) g2 <- c(12, 18, 32) We want to see if the two groups belong to the same distribution or can be considered as different groups. We might be tempted to try a Student’s t-test. t.test(g1, g2) ## Welch Two Sample t-test ## ## data: g1 and g2 ## t = 5.8366, df = 2.9412, p-value = 0.01059 ## alternative hypothesis: true difference in means is not equal [...]

By | 2017-04-30T10:15:37+00:00 October 14, 2015|Categories: Data Analysis, R, Statistics|0 Comments

Don’t ignore the warnings!

I'm sure that all of you R users have now noticed that sometimes R is talking to you. When you do something wrong, R replies with a message written in red in the console. How many of you actually read those error messages? If you take the time to read them carefully, you'll get a hint about what was wrong in your command. Let's look at an example: > sum(c('1','3','4','4')) Error in sum(c("1", "3", "4", "4")) : invalid 'type' (character) [...]

By | 2017-04-30T16:25:19+00:00 September 3, 2015|Categories: R, Statistics|0 Comments

Kaplan-Meier plot

When working with cancer datasets, one of the goal is sometimes to find features (mutation, clinical information, gene expression, ...) associated to prognosis, i.e. features related to the probable outcome of the disease. If that's one of your goal, you'll have to do a survival analysis.  Survival analysis involves a set of methods to model the time at which an event of interest occurs, that event often being death.  But really, any event for which the time of occurence is [...]

By | 2017-04-29T17:14:26+00:00 February 19, 2015|Categories: Data Analysis, Statistics|Tags: |0 Comments

One task, three ways

Usually, there is more than one way to accomplish a task. Some are better, some are worse and others are just as good. Assessing which one to use is often related to the computing time, the ease of use and/or to personal preferences and abilities. Say I have a matrix of thousands of chromosomal features with the following column names : Feature, Start, End. All the positions are found on the same chromosome and the widths of my features are variable. [...]

By | 2017-05-01T10:25:02+00:00 January 15, 2015|Categories: Bioinformatics, R|0 Comments