Formatting data for Circos with R

Formatting data for Circos with R

When generating a Circos plot, the formatting of the data to be represented is a crucial step. Here are some pointers on how to avoid the dreadful *** CIRCOS ERROR ***.

All data files must be in text format. For instance, using R, I would generate a myData.txt file that I would then call within a specific plot block (<plot>…</plot>). Data files are used for 2-dimensional graphical representations (histogram, scatter plot, heatmap, tiles), labels (which are technically also a type of graphical representation) and links. To know how to format your file, you must first determine how you want this data to be illustrated.

Type of data representation Graphs Labels Links
Columns needed chr    start    end     val chr     start       end       label chr1   start1  end1    chr2      start2   end2
Example of data chr1  1000   1199   1.00
chr1  1200   1399    15.00
chr1  1400   1599    -2.00
chr1   11873   14409   DDX11L1
chr1   14361   29370   WASH7P
chr1   17368   17436   MIR6859-1
chr1   486     769      chr15   10026   10033
chr1   3426   3938     chr15   10021   10026
chr1   5763   6268     chr15   10021   10026

Other parameters can be added after the last column  (after the val/label/end2 columns for graphs, labels and  links files respectively), color for instance, but for now we will work with the basic formatting.  To be noted, with or without additional parameters, the processes are still very similar.

Now that we know how we want to represent our data, we can start to format it. Import your raw data in R as a new data frame.

> data_df <- read.table("myRawData.txt", header = TRUE, sep = "\t", as.is = TRUE)

Then, work with your data as you usually would. For example, you could compute means, standard deviations or you could also do a statistic test and only select the values that are significant. This step is completely up to you!

It is important that you keep track of the chromosome and position of a given value. A good way to do so would be to put all the data in a table with a column for the chromosome and two more for the start and end positions. All the other columns are to your discretion.

When you are ready to save your data to a new file, create a new table with all the columns required by your desired representation (graphs, labels or links), as illustrated in the table above. If you were already working with a table, just make sure that the columns are in the right order. To effectively export your data, you could use this simple code line:

> write.table(myDataTable, file = "myData.txt", row.names = FALSE, col.names = FALSE, sep = "\t", quote = FALSE)

myDataTable is the name of the table you want to export while myData.txt is the name of the resulting text file. The circos format does not allow for row labels, hence the row.names = FALSE. Column labels are accepted to some extent, but you must have the exact formulation. To avoid any possible errors, I suggested that you also export you table without its column’s names. The separation argument ensure that every entry of every line are spaced by a tab, which is preferred by Circos. Finally, setting quote to false will remove the quotes (“”) of any string i.e. chromosomes and labels. This last argument is very important for Circos not to crash.

For security, double check your file before using it in a configuration file.

A lot of ***CIRCOS ERROR*** can be avoided when you know how to properly format your data files!

By | 2016-11-08T09:30:09+00:00 October 29, 2015|Categories: Circos, Data Visualization, R|0 Comments

About the Author:

Leave A Comment