A Week of Deep Learning

From August 21 to 25, IVADO and the MILA held their first edition of the École d'été francophone en apprentissage profond. The aim of this summer school was to "give [the participants] the theoretical and practical basis for understanding [deep learning]". A few members of the platform and myself participated to these five days of training. I must be honest, I was a little afraid of deep learning the first time it was presented to me. I found the concept [...]

By | 2017-09-22T13:46:35+00:00 September 22, 2017|Categories: Computer science|0 Comments

Big data, big challenge – part 2

This post follows my previous post on big data. Even though the latter did not result in a big virtual discussion, I was pleased to read some comments regarding the situation in other areas of bioinformatics. Proteomics Mathieu Courcelles, bioinformatician at the proteomics platform, explained that mass-spectrometry driven proteomics has always generated 'big data', so this expression is not used in the field. As he said, Mass spectrometers are indeed instruments that generate a large volume of data 24/7. Early on [...]

By | 2017-08-18T13:24:34+00:00 August 18, 2017|Categories: Data Analysis|Tags: , |0 Comments

Gradient Descent

Gradient descent is an iterative algorithm that aims to find values for the parameters of a function of interest which minimizes the output of a cost function with respect to a given dataset. Gradient descent is often used in machine learning to quickly find an approximative solution to complex, multi-variable problems. In my last article, Introduction to Linear Regression, I mentioned gradient descent as a possible solution to simple linear regression. While there exists an optimal analytical solution to simple [...]

By | 2017-08-03T16:23:44+00:00 August 3, 2017|Categories: Data Analysis, Machine learning, Python, Uncategorized|0 Comments

R or Python, you choose!

I have already briefly introduced pandas, a Python library, by comparing some of its functions to their equivalents in R. Pandas is a library that makes Python almost as convenient as R when doing data visualization and exploration from matrices and data frames (it is built on top of numpy).  It has evolved a lot these past few years as has its community of users. Although pandas is being integrated in a number of specialized packages, such as rdkit for chemoinformatics, [...]

By | 2017-06-26T13:49:42+00:00 June 26, 2017|Categories: Data Analysis, Python, R|Tags: , |0 Comments

Dimensionality Reduction Tutorials: 1- Principal Components Analysis

Understanding dimensionality reduction If you use large datasets (transcriptomes, whole genome sequencing, proteomes), sooner or later you will stumble across something called Principal Components Analysis (PCA). PCA is a dimensionality reduction, a family that encompasses many techniques that do just that: reduce the dimensionality. But what does that mean? What are dimensions and why would we want to reduce their number? How about we deal with these questions through an example? The problematic Say we have a hypothetical transcriptome, of a [...]

By | 2017-06-26T13:36:29+00:00 June 1, 2017|Categories: Data Analysis, Data Visualization|0 Comments