Machine learning – IRIC's Bioinformatics Platform

Overfitting and Regularization

This series of articles on machine learning wouldn't be complete without dipping our toes in overfitting and regularization. Overfitting The Achille's heel of machine learning is overfitting. As machine learning techniques get more and more powerful (large number of parameters), exposure to overfitting increases. In the context of an overfit, the model violates Occam's razor's principle by generating a model so complex that it begins to memorise small, unimportant details (with no true link to our target) of the training set. [...]

By Jonathan|2017-10-30T12:54:46+00:00October 30, 2017|Categories: Data Analysis, Machine learning, Uncategorized|0 Comments

Gradient Descent

Gradient descent is an iterative algorithm that aims to find values for the parameters of a function of interest which minimizes the output of a cost function with respect to a given dataset. Gradient descent is often used in machine learning to quickly find an approximative solution to complex, multi-variable problems. In my last article, Introduction to Linear Regression, I mentioned gradient descent as a possible solution to simple linear regression. While there exists an optimal analytical solution to simple [...]

By Jonathan|2017-08-03T16:23:44+00:00August 3, 2017|Categories: Data Analysis, Machine learning, Python, Uncategorized|0 Comments

Implementing a “Siamese” Neural Network with Mariana 1.0

Mariana was previously introduced in this blog by Geneviève in her May post Machine learning in life science. The Mariana codebase is currently standing on github at the third release candidate before the launch of the stable 1.0 release. This new version incorporates a large refactorization effort as well as many new features (a complete list of the changes found in the 1.0 version can be found in the changelog). I am taking this opportunity to present here a small tutorial on extending the [...]

By Jonathan|2017-04-29T16:24:07+00:00November 7, 2016|Categories: Machine learning, Python|Tags: computer science, data analysis, machine learning framework, mariana|0 Comments

Machine learning in life science

Machine learning's popularity is increasing among bioinformaticians and biologists as it gives interesting results and has become more accessible than ever. A machine learning model can now be easily applied on a given dataset using R or Python packages. For example, the Python package Scikit-learn provides several algorithms (Random Forest, Support Vector Machine - SVM -, regression model and much more) and good documentation. Even deep machine learning (neural networks with multiple layers or convolutional networks for example) is more accessible [...]

By Geneviève|2016-11-08T09:30:05+00:00May 18, 2016|Categories: Machine learning|0 Comments