Data Analysis

Dimensionality Reduction Tutorials: 1- Principal Components Analysis

Understanding dimensionality reduction If you use large datasets (transcriptomes, whole genome sequencing, proteomes), sooner or later you will stumble across something called Principal Components Analysis (PCA). PCA is a dimensionality reduction, a family that encompasses many techniques that do just that: reduce the dimensionality. But what does that mean? What are dimensions and why would we want to reduce their number? How about we deal with these questions through an example? The problematic Say we have a hypothetical transcriptome, of a [...]

By | 2017-06-01T12:38:13+00:00 June 1, 2017|Categories: Data Analysis, Data Visualization|0 Comments

ggplot2 101 : Easy Visualization for Easier Analysis

Biological data are often easier to interpret and analyse when we can visualize them via a plot format. A good way of doing so is by exploiting the different options of ggplot2, a R plotting system. In the following post, I will present some of my go-to tricks to visualize data: nothing to fancy or to hard, perfect for both the R masters and the R beginners! The sample codes are in R and the ggplot2 library must be installed [...]

By | 2017-05-19T15:08:52+00:00 May 18, 2017|Categories: Data Analysis, Data Visualization, R, Uncategorized|0 Comments

Let Your Data Flow: Streams and Reactive Programming

What's all this about ? ReactiveX is a combination of the best ideas from the Observer pattern, the Iterator pattern, and functional programming. Using Rx, you can easily: - Create event or data emitting streams from sources such as a file or a web service - Compose and transform streams with query-like operators - Subscribe to any observable stream and "react" to its emissions to perform side effects Reactive programming has been gaining traction these past few years. Maybe you've [...]

By | 2017-05-03T09:19:14+00:00 May 2, 2017|Categories: Bioinformatics, Computer science, Data Analysis|Tags: , |2 Comments

Big data, big challenge

You've probably heard the expression "Big Data" before. Particularly, if you read Simon Mathien's blog post on IRIC's website. (If you haven't read it yet, you should do it now!). There exist several definitions (or interpretations) of this expression, which is best summarized by the following two : Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data Oxford English Dictionary Domaine technologique dédié [...]

By | 2017-05-02T21:05:43+00:00 April 24, 2017|Categories: Data Analysis||3 Comments

Introduction to Linear Regression

A data scientist's first goal is to find underlying relations within the variables of a dataset. Several statistical and machine learning methods can be used to discover such relations. Once uncovered, this information can be applied to everyday problems. For example, in clinical medicine, a predictive model based on clinical data can help clinicians guide a patient's treatment by offering insights that might not have otherwise been taken into account. Simple linear regression One of the most basic methods available to [...]

By | 2017-04-29T16:22:09+00:00 March 23, 2017|Categories: Data Analysis, Python||0 Comments