eaudemard

About Éric

I’ve started as a computer scientist, then I have quickly realised that bioinformatics are saturated by puzzles to solve. As in the "The Summit of the Gods" (Jirō Taniguchi), there are always a new mountain to climb or a path more straightforward.

Introduction to cowplot to combine several plots in one with R

Hi everyone, Today I will introduce cowplot, an extension of ggplot2 library. Some helpful extensions and modifications to the 'ggplot2' package. In particular, this package makes it easy to combine multiple 'ggplot2' plots into one and label them with letters, e.g. A, B, C, etc., as is often required for scientific publications. As you can see, this library can be useful to easily create a figure containing multiple plots. But we will see how we can use it to create [...]

By | November 28, 2016|Categories: Bioinformatics, Biology, Data Analysis, Data Visualisation, R|0 Comments

Fastest method to compute an AUC

Context: AUC is an acronym for "Area Under the (ROC) Curve". If you are not familiar with the ROC curve and AUC, I suggest reading this blog post before to continuing further. For several projects, I needed to compute a large number of AUC. It started with 25,000, increased to 230,000 and now I need to compute 1,500,000 AUC. With so many AUC, the time to compute each one becomes critical. On the web, I don't find much information about this specific [...]

By | August 18, 2016|Categories: Data Analysis, Performance, Python, R, Statistics|0 Comments

Factorial and Log Factorial

Factorial: When you need to calculate n!, you have several solutions.  The "rush" solution: using a loop or a recursive function:  def factorial_for(n): r = 1 for i in range(2, n + 1): r *= i return(r) def factorial_rec(n): if n > 1: return(n * factorial_rec(n - 1)) else: return(1) Here, the multiplication of the numbers sequentially will create a huge number very quickly. This is good, but computers are faster when 2 small numbers (120x30240) are involved in a multiplication versus the [...]

By | February 22, 2016|Categories: Bioinformatics, Data Analysis, Performance, Python|0 Comments

[Python] Iterators vs Generators

In Python, there are iterators and generators. You probably already use iterators without even knowing that you do so. But understanding the difference between those two concepts is really important since choosing one over the other has a huge impact on memory usage. If you are working with small datasets, memory usage might not be your first concern. However, with big datasets, it is another story. So what are they exactly, iterators and generators? Iterators The process of going through [...]

By | September 18, 2015|Categories: Bioinformatics, Performance, Python, Uncategorized|0 Comments

Be better at programming with static program analysis

- What is static program analysis ? Static program analysis allows the gathering of informations about the execution behaviour of your code without actually executing it. It is the opposite of dynamic program analysis (like debugging) which required the code to be executed. - Ok! But why should I use this in practice ? To save time by suppressing the save/execute cycles induced by syntax errors (missing ";", function or variable not initialized, typos, ...). Correcting these errors at the [...]

By | May 8, 2015|Categories: Performance, Python, R, Web development|0 Comments