pyGeno's logo

pyGeno: A Python package for precision medicine and proteogenomics


pyGeno’s lair is on Github.

Citing pyGeno:

Please cite this paper.

A Quick Intro:

Even tough more and more research focuses on Personalized/Precision Medicine, treatments that are specically tailored to the patient, pyGeno is (to our knowlege) the only tool available that will gladly build your specific genomes for you and you give an easy access to them.

pyGeno allows you to create and work on Personalized Genomes: custom genomes that you create by combining a reference genome, sets of polymorphims and an optional filter. pyGeno will take care of applying the filter and inserting the polymorphisms at their right place, so you get direct access to the DNA and Protein sequences of your patients/subjects. To know more about how to create a Personalized Genome, have a look at the Quickstart section.

pyGeno can also function as a personal bioinformatic database for Ensembl, that runs directly into python, on your laptop, making faster aned more reliable than any REST API. pyGeno makes extracting data such as gene sequences a breeze, and is designed to be able cope with huge queries.

from pyGeno.Genome import *

g = Genome(name = "GRCh37.75")
prot = g.get(Protein, id = 'ENSP00000438917')[0]
#print the protein sequence
print prot.sequence
#print the protein's gene biotype
print prot.gene.biotype
#print protein's transcript sequence
print prot.transcript.sequence

#fancy queries
for exons in g.get(Exons, {"CDS_start >": x1, "CDS_end <=" : x2, "chromosome.number" : "22"}) :
        #print the exon's coding sequence
        print exon.CDS
        #print the exon's transcript sequence
        print exon.transcript.sequence

#You can do the same for your subject specific genomes
#by combining a reference genome with polymorphisms
g = Genome(name = "GRCh37.75", SNPs = ["STY21_RNA"], SNPFilter = MyFilter())

Verbose Introduction

pyGeno integrates:

  • Reference sequences and annotations from Ensembl
  • Genomic polymorphisms from the dbSNP database
  • SNPs from next-gen sequencing

pyGeno is a python package that was designed to be:

  • Fast to install. It has no dependencies but its own backend: rabaDB.
  • Fast to run and memory efficient, so you can use it on your laptop.
  • Fast to use. No queries to foreign APIs all the data rests on your computer, so it is readily accessible when you need it.
  • Fast to learn. One sigle function get() can do the job of several other tools at once.

It also comes with:

  • Parsers for: FASTA, FASTQ, GTF, VCF, CSV.
  • Useful tools for translation etc...
  • Optimised genome indexation with Segment Trees.
  • A funky Progress Bar.

One of the the coolest things about pyGeno is that it also allows to quickly create personalized genomes. Genomes that you design yourself by combining a reference genome and SNP sets derived from dbSNP or next-gen sequencing.

pyGeno is developed by Tariq Daouda at the Institute for Research in Immunology and Cancer (IRIC), its logo is the work of the freelance designer Sawssan Kaddoura. For the latest news about pyGeno, you can follow me on twitter @tariqdaouda.

Indices and tables