Generating Synthetic Genomic Data

Applying statistical methods is a large part of the work of a bioinformatician. Apart from some more classical techniques, machine learning algorithms are also regularly applied to clinical and biological data (notably, clustering techniques such as k-means). Some techniques such as artificial neural networks have recently found great success in areas such as image recognition and natural language processing. However, these techniques do not perform as well on small datasets with high dimensionality, a problem known as "the curse of dimensionality". [...]

By | January 7, 2016|Categories: Bioinformatics, Data Analysis, Python|0 Comments

Grep parameters every bioinformatician should know

Your shell, along with the myriad command line programs it exposes is clearly a great friend when it comes to file manipulation. And let's face it, file manipulation is a big part of a bioinformatician's daily workload. Now, since we rarely have the time to review all the options offered by the different programs I thought I'd list some really useful ones from grep. I expect everyone to know what grep is and what it does so let's just get [...]

By | November 27, 2015|Categories: Bioinformatics, Data Analysis, Shell scripting|0 Comments

[Python] Iterators vs Generators

In Python, there are iterators and generators. You probably already use iterators without even knowing that you do so. But understanding the difference between those two concepts is really important since choosing one over the other has a huge impact on memory usage. If you are working with small datasets, memory usage might not be your first concern. However, with big datasets, it is another story. So what are they exactly, iterators and generators? Iterators The process of going through [...]

By | September 18, 2015|Categories: Bioinformatics, Performance, Python, Uncategorized|0 Comments

Draw me a Circos

How pretty would that look in my article? Very Pretty! As well as being informative! You might want to use a Circos for your own personal analysis or as an article figure. In both cases, this kind of representation is useful when it comes to visualizing data in a more global or complete manner:  you can have multiple types of data ranging across various chromosomal sequences. However, as wonderful and exciting the idea of having your own personal Circos might [...]

By | August 20, 2015|Categories: Bioinformatics, Biology, Data Visualisation|0 Comments

Identifying a point in ggplot2

So you have spent much time converting your simple R plot to a full-fledged ggplot2 graph with all its bells and whistles just to find that you are unable to identify a point on this graph to further investigate it. Indeed, the typical identify method is not applicable to ggplot2 graphs. Fortunately, there is a solution, which involves performing all the work yourself by going under the hood of ggplot2 to access the low-level graphics system on which it is [...]

By | March 11, 2015|Categories: Bioinformatics, Data Visualisation, R|0 Comments