IRIC's Bioinformatics Platform

Overfitting and Regularization

This series of articles on machine learning wouldn't be complete without dipping our toes in overfitting and regularization. Overfitting The Achille's heel of machine learning is overfitting. As machine learning techniques get more and more powerful (large number of parameters), exposure to overfitting increases. In the context of an overfit, the model violates Occam's razor's principle by generating a model so complex that it begins to memorise small, unimportant details (with no true link to our target) of the training set. [...]

By Jonathan|2017-10-30T12:54:46+00:00October 30, 2017|Categories: Data Analysis, Machine learning, Uncategorized|0 Comments

Let it roam free ! Releasing your code into the wild…

Today, I thought I'd do something a little different and talk about what one might expect from publicly releasing some code. I figured it might be nice to interview someone from our group which has lots of experience doing so, Tariq Daouda, to gain some of his insights. So without further ado, here we go ! JP: Hi Tariq, glad to have you with us. I thought I might ask you a few questions regarding what happens when one decides [...]

By Jean-Philippe|2017-10-16T15:59:03+00:00October 16, 2017|Categories: Computer science|Tags: GitHub, public licences|0 Comments

A Week of Deep Learning

From August 21 to 25, IVADO and the MILA held their first edition of the École d'été francophone en apprentissage profond. The aim of this summer school was to "give [the participants] the theoretical and practical basis for understanding [deep learning]". A few members of the platform and myself participated to these five days of training. I must be honest, I was a little afraid of deep learning the first time it was presented to me. I found the concept [...]

By Caroline|2017-09-22T13:46:35+00:00September 22, 2017|Categories: Computer science|0 Comments

Big data, big challenge – part 2

This post follows my previous post on big data. Even though the latter did not result in a big virtual discussion, I was pleased to read some comments regarding the situation in other areas of bioinformatics. Proteomics Mathieu Courcelles, bioinformatician at the proteomics platform, explained that mass-spectrometry driven proteomics has always generated 'big data', so this expression is not used in the field. As he said, Mass spectrometers are indeed instruments that generate a large volume of data 24/7. Early on [...]

By Geneviève|2017-08-18T13:24:34+00:00August 18, 2017|Categories: Data Analysis|Tags: big data, data integration|0 Comments

Gradient Descent

Gradient descent is an iterative algorithm that aims to find values for the parameters of a function of interest which minimizes the output of a cost function with respect to a given dataset. Gradient descent is often used in machine learning to quickly find an approximative solution to complex, multi-variable problems. In my last article, Introduction to Linear Regression, I mentioned gradient descent as a possible solution to simple linear regression. While there exists an optimal analytical solution to simple [...]

By Jonathan|2017-08-03T16:23:44+00:00August 3, 2017|Categories: Data Analysis, Machine learning, Python, Uncategorized|0 Comments