Data Analysis

SciPy and Logistic Regressions

Given a set of data points, we often want to see if there exists a satisfying relationship between them. Linear regressions can easily be visualized with Seaborn, a Python library that is meant for exploration and visualization rather than statistical analysis. As for logistic regressions, SciPy is a good tool when one does not have his or her own analysis script. Let's look at the optimize package from scipy.optimize import [...]

By Caroline|2017-04-29T16:58:35+00:00June 9, 2016|Categories: Data Analysis, Python|Tags: chemical screen, curve fitting|0 Comments

What to consider when interpreting proteomic data

** Special collaboration from the proteomic platform** Following your sample's analysis by mass spectrometry, you will usually receive your results as a list of proteins. During the treatment of the data, some factors inevitably influence the proteins found in the final list. Fig. 1 Overview of bottom-up proteomics. Figure modified from Angel et al. (2011) Let's begin by briefly explaining how this protein list is generated by the bottom-up approach usually used (see Figure 1). In this [...]

By Mathieu|2017-04-29T17:08:11+00:00December 7, 2015|Categories: Data Analysis|Tags: protéomique|0 Comments

Grep parameters every bioinformatician should know

Your shell, along with the myriad command line programs it exposes is clearly a great friend when it comes to file manipulation. And let's face it, file manipulation is a big part of a bioinformatician's daily workload. Now, since we rarely have the time to review all the options offered by the different programs I thought I'd list some really useful ones from grep. I expect everyone to know what grep is and what it does so let's just get [...]

By Jean-Philippe|2017-04-29T15:35:48+00:00November 27, 2015|Categories: Data Analysis, Shell scripting|Tags: bash, unix|0 Comments

Applying PCA to Leucegene data

GEO offers an extremely rich source of transcriptional profile data, but downloading and preparing a dataset is often an obstacle to aspiring bioinformaticians. I'll walk you through one way to do it using the Leucegene dataset as an example. Once this data is loaded and ready to use in R, I'll then present a very simplified and practical perspective on the use of PCA for exploratory analysis. Loading data A dataset of 285 transcriptional profiles of acute myeloid leukemia (AML) [...]

By Sébastien|2017-04-29T23:05:21+00:00November 17, 2015|Categories: Data Analysis, R|Tags: gene expression, PCA|0 Comments

Permutations

Say we have the two following groups : g1 <- c(55, 65, 58) g2 <- c(12, 18, 32) We want to see if the two groups belong to the same distribution or can be considered as different groups. We might be tempted to try a Student’s t-test. t.test(g1, g2) ## Welch Two Sample t-test ## ## data: g1 and g2 ## t = 5.8366, df = 2.9412, p-value = 0.01059 ## alternative hypothesis: true difference in means is not equal [...]

By Geneviève|2017-04-30T10:15:37+00:00October 14, 2015|Categories: Data Analysis, R, Statistics|0 Comments