Performance

A multiprocessing example and more

Recently, I had to search a given chemical structure into a list of structures. Using the python chemoinformatics packages pybel and rdkit, I was easily able to do so but the operation took a little too much time for my linking. Wondering how I could search faster, I immediately thought about Jean-Philippe's previous blog post titled Put Those CPUs to Good Use. I've decided to follow his instructions and give it a try. Goal Look for a molecule (a given [...]

By |2017-12-11T12:55:55+00:00December 11, 2017|Categories: Bioinformatics, Computer science, Performance|0 Comments

Fast network transfers?

Recently, everyone and their mother started using various tools in order to optimize large data transfer to, from and between supercomputers. Historically, we have seen tools like FDT, BBCP that tried to exceed the performance obtained from other transfer methods, like scp, rsync, ftp, etc. One tool in particular is now gaining traction and is being deployed on most supercomputers: GridFTP and its front-end Globus. The Globus frontend interface. Before jumping into the bandwagon, I thought it would [...]

By |2017-04-29T17:04:17+00:00October 13, 2016|Categories: Computer science, Performance|Tags: , |0 Comments

Simple multiprocessing in R (2nd edition)

The last time I spoke about this subject, I presented a really simple way to change an lapply call into its multicore sibling mclapply. Now while this is an extremely easy modification to implement in your code to gain substantial performance benefits, it kinda required you to be making use of the lapply function in the first place. So let's look at another way to introduce multiprocessing into your existing codebase with the use of the foreach and doMC packages. [...]

By |2017-04-29T16:24:45+00:00September 19, 2016|Categories: Performance, R|Tags: , |0 Comments

Fastest method to compute an AUC

Context: AUC is an acronym for "Area Under the (ROC) Curve". If you are not familiar with the ROC curve and AUC, I suggest reading this blog post before to continuing further. For several projects, I needed to compute a large number of AUC. It started with 25,000, increased to 230,000 and now I need to compute 1,500,000 AUC. With so many AUC, the time to compute each one becomes critical. On the web, I don't find much information about this specific [...]

By |2017-04-29T16:56:33+00:00August 18, 2016|Categories: Performance, Python, R, Statistics|Tags: |2 Comments

Speed up random disk access

When working with a software that accesses data from disk in a random fashion, it is common knowledge that best performance will be reached using SSD hard drives, with SAS disks being less efficient and SATA disks being the worst. However, high capacity SSD drives are still relatively expensive and thus, when working with large datasets, one typically ends up working with data stored on larger, and more common SATA drives. I recently experimented with the Jellyfish software to analyze [...]

By |2017-04-29T17:05:04+00:00August 4, 2016|Categories: Computer science, Performance|0 Comments
Go to Top