Generating Synthetic Genomic Data

Applying statistical methods is a large part of the work of a bioinformatician. Apart from some more classical techniques, machine learning algorithms are also regularly applied to clinical and biological data (notably, clustering techniques such as k-means). Some techniques such as artificial neural networks have recently found great success in areas such as image recognition and natural language processing. However, these techniques do not perform as well on small datasets with high dimensionality, a problem known as "the curse of dimensionality". [...]

By |2017-04-29T23:00:58+00:00January 7, 2016|Categories: Bioinformatics, Python|Tags: , |0 Comments

Formatting data for Circos with R

When generating a Circos plot, the formatting of the data to be represented is a crucial step. Here are some pointers on how to avoid the dreadful *** CIRCOS ERROR ***. All data files must be in text format. For instance, using R, I would generate a myData.txt file that I would then call within a specific plot block (<plot>...</plot>). Data files are used for 2-dimensional graphical representations (histogram, scatter plot, heatmap, tiles), labels (which are technically also a type [...]

By |2017-04-29T15:36:21+00:00October 29, 2015|Categories: Data Visualization, R|Tags: , , |0 Comments