Data Analysis

Kaplan-Meier plot

When working with cancer datasets, one of the goal is sometimes to find features (mutation, clinical information, gene expression, ...) associated to prognosis, i.e. features related to the probable outcome of the disease. If that's one of your goal, you'll have to do a survival analysis.  Survival analysis involves a set of methods to model the time at which an event of interest occurs, that event often being death.  But really, any event for which the time of occurence is [...]

By | 2017-04-29T17:14:26+00:00 February 19, 2015|Categories: Data Analysis, Statistics|Tags: |0 Comments

python and pandas

R is undeniably a must-use language. Especially for data visualization. But R can sometimes be a little bit slow when dealing with big datasets. If you don't need to create awesome graphs or don't have time to wait, there's an alternative in Python that can be quite fast for data manipulation. The Python Data Analysis Library, pandas, provides an easy way to manipulate data in python. Recently, I had to deal with a big gene expression file (21024 genes x [...]

By | 2017-04-29T15:49:18+00:00 April 17, 2014|Categories: Data Analysis, Python|Tags: , |1 Comment

lifelines (or doing survival analysis in Python)

Lately, I've been doing survival analysis.  I'm not an expert but we had a self-learning group based on David G. Kleinbaum and Mitchel Klein’s  book,   "Survival Analysis. A Self-Learning Text" .  At the end of this book, there's code provided to help you get started in SAS, Stata, SPSS and... R!  I've played with the R package survival which is quite good!  My problem was that I wanted to do survival analysis in Python.  I've started by doing it with [...]

By | 2017-04-29T17:16:41+00:00 March 24, 2014|Categories: Data Analysis, Python, Statistics|Tags: |0 Comments