trofimov

About Assya

A biochemist turned bioinformatician (Masters in progress), I like statistics and machine learning, efficient programs and good coffee.

Dimensionality Reduction Tutorials: 1- Principal Components Analysis

Understanding dimensionality reduction If you use large datasets (transcriptomes, whole genome sequencing, proteomes), sooner or later you will stumble across something called Principal Components Analysis (PCA). PCA is a dimensionality reduction, a family that encompasses many techniques that do just that: reduce the dimensionality. But what does that mean? What are dimensions and why would we want to reduce their number? How about we deal with these questions through an example? The problematic Say we have a hypothetical transcriptome, of a [...]

By |2017-06-26T13:36:29+00:00June 1, 2017|Categories: Data Analysis, Data Visualization|1 Comment

Beginner R: functions that make your life easier

Let’s get to know my top 10 R’s neat little functions and tricks that make our life easier when manipulating data in R. Sequences Want to make long sequences of numbers or letters but don’t feel like writing them all out into a vector? R let’s you make a sequence with “:” for numbers. You can also use seq() if you are looking for a regular sequence that is not incremented by one. letters[] let’s you make continuous letter sequences, [...]

By |2017-04-29T17:06:48+00:00January 28, 2016|Categories: Bioinformatics, R|Tags: |0 Comments

Table-reading: loading data into R without a hassle

The first thing I have learned in R is how to load a table. Usually, when you start your R journey, someone more knowledgeable will tell you how to do this very first action. It will typically be: data<-read.table("~/SomeFolder/datafile.txt") You probably will be adding various parameters into the brackets such as "row.names=0" or "header=TRUE" or, "sep="\t"", to make sure you are reading your file correctly. And this is perfectly fine, as a loading method of small datasets. However, to maximize [...]

By |2017-04-29T17:14:58+00:00February 5, 2015|Categories: Bioinformatics, R|Tags: |1 Comment

Teach me how to box-plot!

Boxplots are everywhere! Publishers like boxplots.  But ask some people and most don't even know what a boxplot represents!  Recently I wanted to examine gene expression data between two samples for a certain gene. The gold standard to look at it would be *drum roll*... A boxplot! Interesting fact #1: Did you know a boxplot is called a “box and whiskers plot” as well?  Let's take a look! A boxplot is easily generated in the analysis software R and its interpretation [...]

By |2017-04-29T15:41:25+00:00September 21, 2014|Categories: Data Visualization, R, Statistics|0 Comments
Go to Top