Data Analysis

Bootstraps and Confidence Intervals

When analyzing data, you might want or need to fit a specific curve to a particular dataset. This type of analysis can result in instructive outputs regarding the relationship between two (or more...) quantifiable parameters. The main object of this post is not how to implement such fitting, but rather how to display the goodness of such a fit i.e. how to calculate a confidence interval around a fitted curve. That being said, I will show how to do curve fitting in [...]

By | 2017-04-29T18:33:55+00:00 September 29, 2016|Categories: Data Analysis, R, Statistics|Tags: |1 Comment

SciPy and Logistic Regressions

Given a set of data points, we often want to see if there exists a satisfying relationship between them. Linear regressions can easily be visualized with Seaborn, a Python library that is meant for exploration and visualization rather than statistical analysis. As for logistic regressions, SciPy is a good tool when one does not have his or her own analysis script. Let's look at the optimize package                        from scipy.optimize import [...]

By | 2017-04-29T16:58:35+00:00 June 9, 2016|Categories: Data Analysis, Python|Tags: , |0 Comments

What to consider when interpreting proteomic data

** Special collaboration from the proteomic platform** Following your sample's analysis by mass spectrometry, you will usually receive your results as a list of proteins.    During the treatment of the data, some factors inevitably influence the proteins found in the final list. Fig. 1 Overview of bottom-up proteomics. Figure modified from Angel et al. (2011)   Let's begin by briefly explaining how this protein list is generated by the bottom-up approach usually used (see Figure 1).  In this [...]

By | 2017-04-29T17:08:11+00:00 December 7, 2015|Categories: Data Analysis|Tags: |0 Comments

Grep parameters every bioinformatician should know

Your shell, along with the myriad command line programs it exposes is clearly a great friend when it comes to file manipulation. And let's face it, file manipulation is a big part of a bioinformatician's daily workload. Now, since we rarely have the time to review all the options offered by the different programs I thought I'd list some really useful ones from grep. I expect everyone to know what grep is and what it does so let's just get [...]

By | 2017-04-29T15:35:48+00:00 November 27, 2015|Categories: Data Analysis, Shell scripting|Tags: , |0 Comments

Applying PCA to Leucegene data

GEO offers an extremely rich source of transcriptional profile data, but downloading and preparing a dataset is often an obstacle to aspiring bioinformaticians. I'll walk you through one way to do it using the Leucegene dataset as an example. Once this data is loaded and ready to use in R, I'll then present a very simplified and practical perspective on the use of PCA for exploratory analysis. Loading data A dataset of 285 transcriptional profiles of acute myeloid leukemia (AML) [...]

By | 2017-04-29T23:05:21+00:00 November 17, 2015|Categories: Data Analysis, R|Tags: , |0 Comments