Pivoting tables : from long to wide

As bioinformaticians, we often have to work with data that are not formatted the way we would need them to be. One case we might encounter is receiving data in a "long" format instead of receiving them in a more familiar "wide" format. For those of you familiar with the ggplot R package, you know this format very well. It's the format required by ggplot to produce its nice graphs.   Long genes samples expression 1 BAD S01 7.525395 2 [...]

By | 2017-04-29T18:11:56+00:00 November 14, 2016|Categories: Data Analysis, Python, R|Tags: |0 Comments

Implementing a “Siamese” Neural Network with Mariana 1.0

Mariana was previously introduced in this blog by Geneviève in her May post Machine learning in life science. The Mariana codebase is currently standing on github at the third release candidate before the launch of the stable 1.0 release. This new version incorporates a large refactorization effort as well as many new features (a complete list of the changes found in the 1.0 version can be found in the changelog). I am taking this opportunity to present here a small tutorial on extending the [...]

By | 2017-04-29T16:24:07+00:00 November 7, 2016|Categories: Machine learning, Python|Tags: , , , |0 Comments

Fastest method to compute an AUC

Context: AUC is an acronym for "Area Under the (ROC) Curve". If you are not familiar with the ROC curve and AUC, I suggest reading this blog post before to continuing further. For several projects, I needed to compute a large number of AUC. It started with 25,000, increased to 230,000 and now I need to compute 1,500,000 AUC. With so many AUC, the time to compute each one becomes critical. On the web, I don't find much information about this specific [...]

By | 2017-04-29T16:56:33+00:00 August 18, 2016|Categories: Performance, Python, R, Statistics|Tags: |0 Comments

SciPy and Logistic Regressions

Given a set of data points, we often want to see if there exists a satisfying relationship between them. Linear regressions can easily be visualized with Seaborn, a Python library that is meant for exploration and visualization rather than statistical analysis. As for logistic regressions, SciPy is a good tool when one does not have his or her own analysis script. Let's look at the optimize package                        from scipy.optimize import [...]

By | 2017-04-29T16:58:35+00:00 June 9, 2016|Categories: Data Analysis, Python|Tags: , |0 Comments

Parallelize your Python !

This article will teach you what are multithreads, multicores, and in what circumstances each can be used. Your nerd friend keeps telling you about his professional deformation all the time? Wanting to parallelize and optimize his time? Do you wish to understand it as well and save time by parallelizing your programs in Python? Then this article is what you need! You will be able to gain big amounts of time, thanks to a small dose of parallelism [...]

By | 2017-04-29T15:33:41+00:00 March 29, 2016|Categories: Performance, Python|Tags: , |0 Comments