Document your work by adding parameters to your shell scripts

At some point during your bioinformatics career, you're going to start writing shell scripts, it's kind of inevitable ! So let us discuss a strategy to add parameters to your scripts in order make them more easily reusable, while also keeping a trace of the settings you used to generate a set of results. (Disclaimer: the procedures described below have been tested using BASH, some modifications might be necessary if you use a different shell.) Modifying your scripts in order [...]

By |2021-07-26T15:12:30+00:00August 14, 2018|Categories: Bioinformatics, Shell scripting|0 Comments

Realize your Bash potential

A bioinformatician's best tool is his shell. While some have already mastered the dark arts of the bash shell, I often see beginners (and even catch myself at times!) unknowingly repeating key sequences when they could be getting the same result with a few simple built-in keybindings or programmatic shortcuts. Let's have a look at some of the most useful bash shortcuts that no self-respecting bioinformatician should be without. This is by no means an exhaustive list of what Bash has to offer but will hopefully serve to save [...]

By |2017-04-29T22:57:32+00:00May 26, 2016|Categories: Computer science, Shell scripting|0 Comments

Grep parameters every bioinformatician should know

Your shell, along with the myriad command line programs it exposes is clearly a great friend when it comes to file manipulation. And let's face it, file manipulation is a big part of a bioinformatician's daily workload. Now, since we rarely have the time to review all the options offered by the different programs I thought I'd list some really useful ones from grep. I expect everyone to know what grep is and what it does so let's just get [...]

By |2017-04-29T15:35:48+00:00November 27, 2015|Categories: Data Analysis, Shell scripting|Tags: , |0 Comments

Working with large files

When dealing with Next Generation Sequencing data, I am routinely asked by clients how to open sequence files. The answer is that given their huge size (often many million lines) and the consequent requirement in memory, they should probably not be opened in any way, they should only be processed. Most software designed to work with NGS data will then process these files in a sequential fashion or stream, loading just the required amount of data from disk, processing it [...]

By |2022-06-09T12:35:12+00:00October 1, 2015|Categories: Data Analysis, Shell scripting|Tags: , |1 Comment