Data Visualization

Create a nice looking table using R

Hi everyone, Today I will introduce formattable. This package is designed for applying formatting on vectors and data frames to make data presentation easier, richer, more flexible and hopefully convey more information. We will see how to use this package to interpret your data at a glance, with just a few lines of code (You can follow along below as well as check all the code in my git). Before going further, I will specify that this package is generally used [...]

Introduction to Linear Regression

A data scientist's first goal is to find underlying relations within the variables of a dataset. Several statistical and machine learning methods can be used to discover such relations. Once uncovered, this information can be applied to everyday problems. For example, in clinical medicine, a predictive model based on clinical data can help clinicians guide a patient's treatment by offering insights that might not have otherwise been taken into account. Simple linear regression One of the most basic methods available to [...]

Bootstraps and Confidence Intervals

When analyzing data, you might want or need to fit a specific curve to a particular dataset. This type of analysis can result in instructive outputs regarding the relationship between two (or more...) quantifiable parameters. The main object of this post is not how to implement such fitting, but rather how to display the goodness of such a fit i.e. how to calculate a confidence interval around a fitted curve. That being said, I will show how to do curve fitting in [...]

By | 2016-11-08T09:30:03+00:00 September 29, 2016|Categories: Data Analysis, Data Visualization, R|1 Comment

SciPy and Logistic Regressions

Given a set of data points, we often want to see if there exists a satisfying relationship between them. Linear regressions can easily be visualized with Seaborn, a Python library that is meant for exploration and visualization rather than statistical analysis. As for logistic regressions, SciPy is a good tool when one does not have his or her own analysis script. Let's look at the optimize package                        from scipy.optimize import [...]

By | 2016-11-08T09:30:04+00:00 June 9, 2016|Categories: Bioinformatics, Data Analysis, Data Visualization, Python|0 Comments

Standard deviation on a correlation scatter plot

I was recently asked by a colleague to provide visualization of differential gene expression computed using RPKM values (two samples, no replicates) and highlight genes that were outside the distribution by 2 standard deviations or more. As a first draft, I quickly obliged by calculating the fold change distribution, computing standard deviation and drawing lines on either side of the diagonal to obtain: This turns out to be equivalent to computing the standard deviation of the residual of a linear [...]

By | 2016-11-08T09:30:06+00:00 April 5, 2016|Categories: Data Analysis, Data Visualization, R, Statistics|0 Comments