[GENIX] Automated Bacterial Genome Annotation Pipeline

The tool

Genix is an online automated pipeline for bacterial genome annotation. The program takes a FASTA file containing a set of sequences, that can be complete chromosomes, contigs or scaffolds, and a tax_id identifier. First, a dataset of proteins associated to the tax_id "Is downloaded from Uniprot and used to build a raw dataset, wich may contain several redundances. CD-HIT (Li & Godzik 2006) is used to build a non-redundant dataset, wich is used to generate the final BLASTp (Altschul et al. 1990; Camacho et al. 2009) database. For the genome annotation, genix uses a combination of several bioinformatics tools, including Prodigal (Hyatt et al. 2010), BLASTp, tRNAscan-SE (Lowe & Eddy 1997), RNAmmer (Lagesen et al. 2007), Aragorn (Laslett 2004), HMMER (Eddy 2011), BLASTn and INFERNAL (Nawrocki et al. 2009), RFam (Griffiths-Jones et al. 2003), Antifam (Eberhardt et al. 2012) and the non-redundant dataset generated by CD-HIT. At the end, genix generates a genbank file, containing all the features identified for each sequence, and, if requested by the user, a the genbank submission file (.sqn) generated by tbl2asn.

Usage

Genix is freely available, but requires registration. Our server performs the annotations one-by-one, so the time needed to annotate your genome depends not only on your sequence and the parameters you set to generate the protein database, but also on the server usage.

tax_id

Tax_id is an universal code used by several public biological database, like Genbank and Uniprot, to identify different taxons. Before the annotation, genix retrieves from Uniprot all the protein sequences linked to the tax_id provided by the user and generates a non-redundant protein database. For more information about the tax_id and to identify the best one for your annotation, please visit the NCBI Taxonomy Database .

Download Source and Benchmarking

You can download the GENIX annotation pipeline source code from our GitHub repository. Results from the comparison of Genix, RAST, BASys and Prokka for the genomes of Leptospira interrogans serovar Copenhageni strain Fiocruz L1-130 (GenBank: NC_000913.3), Escherichia coli strain K12 (GenBank: AE016823.1), Listeria monocytogenes strain EGD-e (GenBank: AL591824.1) and Mycobacterium tuberculosis strain H37Rv (GenBank: AL123456.3) are avaliable at this page.

Citation

Have you used Genix in your research? Please, cite the paper:

Kremer, FS; Eslabão, MR; Dellagostin, OA; Pinto, LS. Genix: A New Online Automated Pipeline for Bacterial Genome Annotation.FEMS Microbiology Letters, V. ? N. ?, 2016.
DOI:10.1093/femsle/fnw263. PubMed: 7856568.

References

Altschul, S.F. et al., 1990. Basic local alignment search tool. Journal of molecular biology, 215(3), pp.403–10.
Camacho, C. et al., 2009. BLAST+: architecture and applications. BMC bioinformatics, 10(1), p.421.
Eberhardt, R.Y. et al., 2012. AntiFam: a tool to help identify spurious ORFs in protein annotation. Database : the journal of biological databases and curation, 2012, p.bas003.
Eddy, S.R., 2011. Accelerated Profile HMM Searches. W. R. Pearson, ed. PLoS computational biology, 7(10), p.e1002195.
Griffiths-Jones, S. et al., 2003. Rfam: an RNA family database. Nucleic acids research, 31(1), pp.439–41.
Hyatt, D. et al., 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics, 11, p.119.
Lagesen, K. et al., 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids research, 35(9), pp.3100–8.
Laslett, D., 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Research, 32(1), pp.11–16.
Li, W. & Godzik, A., 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England), 22(13), pp.1658–9.
Lowe, T.M. & Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research, 25(5), pp.955–64.
Nawrocki, E.P., Kolbe, D.L. & Eddy, S.R., 2009. Infernal 1.0: inference of RNA alignments. Bioinformatics (Oxford, England), 25(10), pp.1335–7.