Almost certainly, one day, you’ll have between your hands a list of outdated gene symbols. And you’ll probably think that updating them is a straightforward task, but it’s not that simple! Because there’s the word ‘bio’ in bioinformatician, updating the gene symbols reminds me of the futile cycle. According to Wikipedia‘s definition, a futile cycle occurs when two metabolic pathways run simultaneously in opposite directions and have no overall effect other than to dissipate energy in the form of heat**. Updating the gene symbols sometimes make you feel like you’re dissipating a lot of energy for not a big overall effect. But it’s useful and necessary.
Updating the gene names themselves is not difficult. I can think of several ways nowadays to do that. Several online tools will help you.
In a perfect world, all symbols would be unique and they would be updated to a symbol never used before. In practice, this is not the case. And people usually work with the symbols and with the symbols only.
Suppose you have to update the gene symbols of a dataset where you have the PKD2 gene. Because it’s a human gene, you can use HUGO online tool which will give you this:
|Input||Match type||Approved symbol||Approved name||HGNC ID||Location|
|PKD2||Approved symbol||PKD2||polycystic kidney disease 2 (autosomal dominant)||HGNC:9009||4q22.1|
|PKD2||Synonyms||PRKD2||protein kinase D2||HGNC:17293||19q13.2|
The same results can be achieved using the Entrez Gene file gene_info.gz :
> zcat gene_info.gz | grep 9606 | cut -f1,2,3,5,8,9 |grep -e '