seguinj

About Jonathan

The platform’s rookie. I spend my days honing my machine learning skills, coding in Python and rock climbing (in no particular order).

Overfitting and Regularization

This series of articles on machine learning wouldn't be complete without dipping our toes in overfitting and regularization. Overfitting The Achille's heel of machine learning is overfitting. As machine learning techniques get more and more powerful (large number of parameters), exposure to overfitting increases. In the context of an overfit, the model violates Occam's razor's principle by generating a model so complex that it begins to memorise small, unimportant details (with no true link to our target) of the training set. [...]

By | 2017-10-30T12:54:46+00:00 October 30, 2017|Categories: Data Analysis, Machine learning, Uncategorized|0 Comments

Gradient Descent

Gradient descent is an iterative algorithm that aims to find values for the parameters of a function of interest which minimizes the output of a cost function with respect to a given dataset. Gradient descent is often used in machine learning to quickly find an approximative solution to complex, multi-variable problems. In my last article, Introduction to Linear Regression, I mentioned gradient descent as a possible solution to simple linear regression. While there exists an optimal analytical solution to simple [...]

By | 2017-08-03T16:23:44+00:00 August 3, 2017|Categories: Data Analysis, Machine learning, Python, Uncategorized|0 Comments

Introduction to Linear Regression

A data scientist's first goal is to find underlying relations within the variables of a dataset. Several statistical and machine learning methods can be used to discover such relations. Once uncovered, this information can be applied to everyday problems. For example, in clinical medicine, a predictive model based on clinical data can help clinicians guide a patient's treatment by offering insights that might not have otherwise been taken into account. Simple linear regression One of the most basic methods available to [...]

Implementing a “Siamese” Neural Network with Mariana 1.0

Mariana was previously introduced in this blog by Geneviève in her May post Machine learning in life science. The Mariana codebase is currently standing on github at the third release candidate before the launch of the stable 1.0 release. This new version incorporates a large refactorization effort as well as many new features (a complete list of the changes found in the 1.0 version can be found in the changelog). I am taking this opportunity to present here a small tutorial on extending the [...]

By | 2017-04-29T16:24:07+00:00 November 7, 2016|Categories: Machine learning, Python|Tags: , , , |0 Comments

Realize your Bash potential

A bioinformatician's best tool is his shell. While some have already mastered the dark arts of the bash shell, I often see beginners (and even catch myself at times!) unknowingly repeating key sequences when they could be getting the same result with a few simple built-in keybindings or programmatic shortcuts. Let's have a look at some of the most useful bash shortcuts that no self-respecting bioinformatician should be without. This is by no means an exhaustive list of what Bash has to offer but will hopefully serve to save [...]

By | 2017-04-29T22:57:32+00:00 May 26, 2016|Categories: Computer science, Shell scripting|0 Comments