Python

Gradient Descent

Gradient descent is an iterative algorithm that aims to find values for the parameters of a function of interest which minimizes the output of a cost function with respect to a given dataset. Gradient descent is often used in machine learning to quickly find an approximative solution to complex, multi-variable problems. In my last article, Introduction to Linear Regression, I mentioned gradient descent as a possible solution to simple linear regression. While there exists an optimal analytical solution to simple [...]

By | 2017-08-03T16:23:44+00:00 August 3, 2017|Categories: Data Analysis, Machine learning, Python, Uncategorized|0 Comments

R or Python, you choose!

I have already briefly introduced pandas, a Python library, by comparing some of its functions to their equivalents in R. Pandas is a library that makes Python almost as convenient as R when doing data visualization and exploration from matrices and data frames (it is built on top of numpy).  It has evolved a lot these past few years as has its community of users. Although pandas is being integrated in a number of specialized packages, such as rdkit for chemoinformatics, [...]

By | 2017-06-26T13:49:42+00:00 June 26, 2017|Categories: Data Analysis, Python, R|Tags: , |0 Comments

Introduction to Linear Regression

A data scientist's first goal is to find underlying relations within the variables of a dataset. Several statistical and machine learning methods can be used to discover such relations. Once uncovered, this information can be applied to everyday problems. For example, in clinical medicine, a predictive model based on clinical data can help clinicians guide a patient's treatment by offering insights that might not have otherwise been taken into account. Simple linear regression One of the most basic methods available to [...]

Logistic regression and GTEx

Working with all sorts of data, it happens sometimes that we want to predict the value of a variable which is not numerical. For those cases, a logistic regression is appropriate. It is similar to a linear regression except that it deals with the fact that the dependent variable is categorical. Here is the formula for the linear regression, where we want to estimate the parameters beta (coefficients) that fit best our data : \begin{equation} Y_i = \beta_0 + \beta_1 X_i [...]

By | 2017-04-29T17:44:14+00:00 January 27, 2017|Categories: Biology, Data Analysis, Python|Tags: , , |0 Comments

SNP Filtering with pyGeno

Looking over the contents of our growing blog (good job guys !), it occured to me that we had not yet posted an article pertaining to the fantastic (and homegrown !) bioinformatics resource that is pyGeno. It turns out I need to use pyGeno to generate data and it's also my turn to write a blog post, how convenient ! I'll focus the article on writing a SNP filter, which can be a bit surprising the first time you try [...]

By | 2017-04-29T17:57:51+00:00 December 9, 2016|Categories: Bioinformatics, Python|Tags: , |0 Comments