The first thing I have learned in R is how to load a table. Usually, when you start your R journey, someone more knowledgeable will tell you how to do this very first action. It will typically be:




You probably will be adding various parameters into the brackets such as “row.names=0” or “header=TRUE” or, “sep="\t"“, to make sure you are reading your file correctly.

And this is perfectly fine, as a loading method of small datasets.
However, to maximize your time, let’s go over the common problems and, more importantly, their solutions new users can encounter when loading a dataset into R.

### Problem #1:  All the data is loaded into one column

Solution: This is fairly common. You need to specify the separator!
This can be achieved by adding sep=”XXX” into the read.table() function. In the place of XXX, you will put in your separator. Typically, it can be either the semi-column character (;) or a comma (,)

[for .csv files] or even a tabulation (\t) [for .txt file].

### Problem #2:  R didn’t understand I needed column names and rownames and put them into the data

Solution: This is rare. But you do need to specify if you table has a header and rownames!

### Problem #3:  My table loaded fine but the data is set to factors. I don’t understand why!


> class(data[,1])
[1] "factor"


Solution: Tell R you don’t want factors! The as.is parameter can help you do that!  Note about factors: Although sometimes very useful, this datatype can be very hard to deal with, especially if you want to do calculations on numeric values in your data frame.  When using the read.table function, R’s default is set to convert all variables into factors.

> data<-read.table("data.txt",as.is = TRUE)
> class(data[,1])
[1] "character" 

### Problem #4:  My table takes a long time to load!

R is efficient at loading data only when it knows in advance what you are loading. As seen in problem #3, R will automatically convert all string data to factors. This means that the conversion is part of the loading step.

Solution: Exactly as seen in problem #3, tell R you don’t want factors!

Not convinced? Here is a performance review:

data.txt contains a RNASeq dataset: 21024 genes for 3080 patients.


> elapsed
771.274 seconds


> elapsed
184.764 seconds




A side note about read.delim(): this function is a wrapper for read.table(), with only the most common features in it. Use at your own risk 🙂