Data Analysis

Generating Synthetic Genomic Data

Applying statistical methods is a large part of the work of a bioinformatician. Apart from some more classical techniques, machine learning algorithms are also regularly applied to clinical and biological data (notably, clustering techniques such as k-means). Some techniques such as artificial neural networks have recently found great success in areas such as image recognition and natural language processing. However, these techniques do not perform as well on small datasets with high dimensionality, a problem known as "the curse of dimensionality". [...]

By | January 7, 2016|Categories: Bioinformatics, Data Analysis, Python|0 Comments

What to consider when interpreting proteomic data

** Special collaboration from the proteomic platform** Following your sample's analysis by mass spectrometry, you will usually receive your results as a list of proteins.    During the treatment of the data, some factors inevitably influence the proteins found in the final list. Fig. 1 Overview of bottom-up proteomics. Figure modified from Angel et al. (2011)   Let's begin by briefly explaining how this protein list is generated by the bottom-up approach usually used (see Figure 1).  In this [...]

By | December 7, 2015|Categories: Data Analysis, Proteomic|0 Comments

Grep parameters every bioinformatician should know

Your shell, along with the myriad command line programs it exposes is clearly a great friend when it comes to file manipulation. And let's face it, file manipulation is a big part of a bioinformatician's daily workload. Now, since we rarely have the time to review all the options offered by the different programs I thought I'd list some really useful ones from grep. I expect everyone to know what grep is and what it does so let's just get [...]

By | November 27, 2015|Categories: Bioinformatics, Data Analysis, Shell scripting|0 Comments

Applying PCA to Leucegene data

GEO offers an extremely rich source of transcriptional profile data, but downloading and preparing a dataset is often an obstacle to aspiring bioinformaticians. I'll walk you through one way to do it using the Leucegene dataset as an example. Once this data is loaded and ready to use in R, I'll then present a very simplified and practical perspective on the use of PCA for exploratory analysis. Loading data A dataset of 285 transcriptional profiles of acute myeloid leukemia (AML) [...]

By | November 17, 2015|Categories: Data Analysis, R|0 Comments

Permutations

Say we have the two following groups : g1 <- c(55, 65, 58) g2 <- c(12, 18, 32) We want to see if the two groups belong to the same distribution or can be considered as different groups. We might be tempted to try a Student’s t-test. t.test(g1, g2) ## Welch Two Sample t-test ## ## data: g1 and g2 ## t = 5.8366, df = 2.9412, p-value = 0.01059 ## alternative hypothesis: true difference in means is not equal [...]

By | October 14, 2015|Categories: Data Analysis, R, Statistics|0 Comments