is an online automated pipeline for bacterial genome annotation.
The program takes a FASTA file
set of sequences, that can be complete chromosomes, contigs or scaffolds, and a
identifier. First, a dataset of proteins
associated to the tax_id
"Is downloaded from Uniprot and
used to build a raw dataset
, wich may contain several redundances. CD-HIT (Li & Godzik 2006) is used to build a non-redundant dataset, wich is used to generate the final
BLASTp (Altschul et al.
1990; Camacho et al.
2009) database. For the genome annotation, genix uses a combination of
several bioinformatics tools, including Prodigal (Hyatt et al.
2010), BLASTp, tRNAscan-SE (Lowe & Eddy 1997), RNAmmer (Lagesen et al.
Aragorn (Laslett 2004), HMMER (Eddy 2011), BLASTn and INFERNAL (Nawrocki et al.
2009), RFam (Griffiths-Jones et al.
Antifam (Eberhardt et al.
2012) and the non-redundant dataset generated by CD-HIT. At the end, genix generates a genbank file, containing all the features
identified for each sequence, and, if requested by the user, a the genbank submission file (.sqn) generated by tbl2asn.
Genix is freely available, but requires registration. Our server performs the annotations one-by-one, so the time needed to
annotate your genome depends not only on your sequence and the parameters you set to generate the protein database, but
also on the server usage.
is an universal code used by several public biological database, like Genbank and Uniprot, to identify
different taxons. Before the annotation, genix retrieves from Uniprot all the protein sequences linked to the tax_id
provided by the user and generates a non-redundant protein database. For more information about the tax_id
and to identify the best one for your annotation,
please visit the NCBI Taxonomy Database
Download Source and Benchmarking
You can download the GENIX annotation pipeline source code from our GitHub repository
. Results from the comparison of Genix
for the genomes of Leptospira interrogans
serovar Copenhageni strain Fiocruz L1-130 (GenBank: NC_000913.3), Escherichia coli
strain K12 (GenBank: AE016823.1), Listeria monocytogenes
strain EGD-e (GenBank: AL591824.1) and Mycobacterium tuberculosis
strain H37Rv (GenBank: AL123456.3) are avaliable at this page
Have you used Genix in your research? Please, cite the paper:
Kremer, FS; Eslabão, MR; Dellagostin, OA; Pinto, LS. Genix: A New Online Automated Pipeline for Bacterial Genome Annotation.FEMS Microbiology Letters, V. ? N. ?, 2016.
DOI:10.1093/femsle/fnw263. PubMed: 7856568.
Altschul, S.F. et al.
, 1990. Basic local alignment search tool. Journal of molecular biology
, 215(3), pp.403–10.
Camacho, C. et al.
, 2009. BLAST+: architecture and applications. BMC bioinformatics
, 10(1), p.421.
Eberhardt, R.Y. et al.
, 2012. AntiFam: a tool to help identify spurious ORFs in protein annotation. Database : the journal of biological databases and curation
, 2012, p.bas003.
Eddy, S.R., 2011. Accelerated Profile HMM Searches. W. R. Pearson, ed. PLoS computational biology
, 7(10), p.e1002195.
Griffiths-Jones, S. et al.
, 2003. Rfam: an RNA family database. Nucleic acids research
, 31(1), pp.439–41.
Hyatt, D. et al.
, 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics
, 11, p.119.
Lagesen, K. et al.
, 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids research
, 35(9), pp.3100–8.
Laslett, D., 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Research
, 32(1), pp.11–16.
Li, W. & Godzik, A., 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England)
, 22(13), pp.1658–9.
Lowe, T.M. & Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research
, 25(5), pp.955–64.
Nawrocki, E.P., Kolbe, D.L. & Eddy, S.R., 2009. Infernal 1.0: inference of RNA alignments. Bioinformatics (Oxford, England)
, 25(10), pp.1335–7.