file manipulation

Fast network transfers?

Recently, everyone and their mother started using various tools in order to optimize large data transfer to, from and between supercomputers. Historically, we have seen tools like FDT, BBCP that tried to exceed the performance obtained from other transfer methods, like scp, rsync, ftp, etc. One tool in particular is now gaining traction and is being deployed on most supercomputers: GridFTP and its front-end Globus. The Globus frontend interface. Before jumping into the bandwagon, I thought it would [...]

By | 2017-04-29T17:04:17+00:00 October 13, 2016|Categories: Computer science, Performance|Tags: , |0 Comments

Working with large files

When dealing with Next Generation Sequencing data, I am routinely asked by clients how to open sequence files. The answer is that given their huge size (often many million lines) and the consequent requirement in memory, they should probably not be opened in any way, they should only be processed. Most software designed to work with NGS data will then process these files in a sequential fashion or stream, loading just the required amount of data from disk, processing it [...]

By | 2017-04-30T10:19:35+00:00 October 1, 2015|Categories: Data Analysis, Shell scripting|Tags: , |1 Comment