The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes

Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang H-Y, Cohoon M, Crécy-Lagard V de, Diaz NN, Disz T, Edwards R, Fonstein M, et al. (2005)
Nucleic Acids Research 33(17): 5691-5702.

Download
OA
Journal Article | Original Article | Published | English
Author
; ; ; ; ; ; ; ; ; ; ;
All
Abstract
The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.
Publishing Year
ISSN
eISSN
PUB-ID

Cite this

Overbeek R, Begley T, Butler RM, et al. The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research. 2005;33(17):5691-5702.
Overbeek, R., Begley, T., Butler, R. M., Choudhuri, J. V., Chuang, H. - Y., Cohoon, M., Crécy-Lagard, V. de, et al. (2005). The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research, 33(17), 5691-5702. doi:10.1093/nar/gki866
Overbeek, R., Begley, T., Butler, R. M., Choudhuri, J. V., Chuang, H. - Y., Cohoon, M., Crécy-Lagard, V. de, Diaz, N. N., Disz, T., Edwards, R., et al. (2005). The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research 33, 5691-5702.
Overbeek, R., et al., 2005. The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research, 33(17), p 5691-5702.
R. Overbeek, et al., “The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes”, Nucleic Acids Research, vol. 33, 2005, pp. 5691-5702.
Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.-Y., Cohoon, M., Crécy-Lagard, V. de, Diaz, N.N., Disz, T., Edwards, R., Fonstein, M., Frank, E.D., Gerdes, S., Glass, E.M., Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A.C., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G.D., Rodionov, D.A., Rückert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V.: The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research. 33, 5691-5702 (2005).
Overbeek, Ross, Begley, Tadhg, Butler, Ralph M., Choudhuri, Jomuna V., Chuang, Han-Yu, Cohoon, Matthew, Crécy-Lagard, Valérie de, Diaz, Naryttza N., Disz, Terry, Edwards, Robert, Fonstein, Michael, Frank, Ed D., Gerdes, Svetlana, Glass, Elizabeth M., Goesmann, Alexander, Hanson, Andrew, Iwata-Reuyl, Dirk, Jensen, Roy, Jamshidi, Neema, Krause, Lutz, Kubal, Michael, Larsen, Niels, Linke, Burkhard, McHardy, Alice C., Meyer, Folker, Neuweger, Heiko, Olsen, Gary, Olson, Robert, Osterman, Andrei, Portnoy, Vasiliy, Pusch, Gordon D., Rodionov, Dmitry A., Rückert, Christian, Steiner, Jason, Stevens, Rick, Thiele, Ines, Vassieva, Olga, Ye, Yuzhen, Zagnitko, Olga, and Vonstein, Veronika. “The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes”. Nucleic Acids Research 33.17 (2005): 5691-5702.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

844 Citations in Europe PMC

Data provided by Europe PubMed Central.

Draft Genome Sequence of Bordetella bronchiseptica KU1201, the First Isolation Source of Arylmalonate Decarboxylase.
Yoshida S, Enoki J, Hemmi R, Kourist R, Kawakami N, Miyamoto K., Genome Announc 3(3), 2015
PMID: 25953178
Complete Genome Sequence of Martelella endophytica YC6887, Which Has Antifungal Activity Associated with a Halophyte.
Khan A, Khan H, Chung EJ, Hossain MT, Chung YR., Genome Announc 3(3), 2015
PMID: 25953177
First draft genome sequence of Aureimonas altamirensis, isolated from patient blood culture.
Eshaghi A, Shahinas D, Patel SN, Kus JV., FEMS Microbiol. Lett. 362(6), 2015
PMID: 25714548
Comparative genomic analysis of seven Mycoplasma hyosynoviae strains.
Bumgardner EA, Kittichotirat W, Bumgarner RE, Lawrence PK., Microbiologyopen (), 2015
PMID: 25693846
Pathogenicity phenomena in three model systems: from network mining to emerging system-level properties.
Castelhano Santos N, Pereira MO, Lourenco A., Brief. Bioinformatics 16(1), 2015
PMID: 24106130
Complete Genome Assembly of Corynebacterium sp. Strain ATCC 6931.
Daligault HE, Davenport KW, Minogue TD, Bishop-Lilly KA, Bruce DC, Chain PS, Coyne SR, Frey KG, Jaissle J, Koroleva GI, Ladner JT, Li PE, Meincke L, Munk AC, Palacios GF, Redden CL, Johnson SL., Genome Announc 2(5), 2014
PMID: 25342684
Draft Genome Sequence of the Shellfish Bacterial Pathogen Vibrio sp. Strain B183.
Schreier HJ, Schott EJ., Genome Announc 2(5), 2014
PMID: 25237023
Crystal structures of three representatives of a new Pfam family PF14869 (DUF4488) suggest they function in sugar binding/uptake.
Kumar A, Punta M, Axelrod HL, Das D, Farr CL, Grant JC, Chiu HJ, Miller MD, Coggill PC, Klock HE, Elsliger MA, Deacon AM, Godzik A, Lesley SA, Wilson IA., Protein Sci. 23(10), 2014
PMID: 25044324
Draft Genome Sequence of Kozakia baliensis SR-745, the First Sequenced Kozakia Strain from the Family Acetobacteraceae.
Schmid J, Koenig S, Pick A, Steffler F, Yoshida S, Miyamoto K, Sieber V., Genome Announc 2(3), 2014
PMID: 24970826
Draft Genome Sequence of Cesiribacter andamanensis Strain AMV16T, Isolated from a Soil Sample from a Mud Volcano in the Andaman Islands, India.
Shivaji S, Ara S, Begum Z, Srinivas TN, Singh A, Kumar Pinnaka A., Genome Announc 1(3), 2013
PMID: 23682146
Phenomenological model for predicting the catabolic potential of an arbitrary nutrient.
Seaver SM, Sales-Pardo M, Guimera R, Amaral LA., PLoS Comput. Biol. 8(11), 2012
PMID: 23133365
Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models.
Binter E, Binter S, Disz T, Kalmanek E, Powers A, Pusch GD, Turgeon J., 3 Biotech 2(2), 2012
PMID: PMC3376863
Functional bias of positively selected genes in Streptococcus genomes.
Suzuki H, Stanhope MJ., Infect. Genet. Evol. 12(2), 2012
PMID: 22155358
Draft genome sequence of Bacteroides faecis MAJ27T, a strain isolated from human feces.
Kim MS, Whon TW, Roh SW, Shin NR, Bae JW., J. Bacteriol. 193(23), 2011
PMID: 22072652
Controlled vocabularies for microbial virulence factors.
Korves T, Colosimo ME., Trends Microbiol. 17(7), 2009
PMID: 19577471

30 References

Data provided by Europe PubMed Central.

Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms.
Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS., J. Biol. Chem. 277(50), 2002
PMID: 12376536
Conservation of the biotin regulon and the BirA regulatory signal in Eubacteria and Archaea.
Rodionov DA, Mironov AA, Gelfand MS., Genome Res. 12(10), 2002
PMID: 12368242

Koonin E.V., Galperin M.Y.., 2002
Function prediction and protein networks.
Huynen MA, Snel B, von Mering C, Bork P., Curr. Opin. Cell Biol. 15(2), 2003
PMID: 12648675
Ancient origin of the tryptophan operon and the dynamics of evolutionary change.
Xie G, Keyhani NO, Bonner CA, Jensen RA., Microbiol. Mol. Biol. Rev. 67(3), 2003
PMID: 12966138

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 16214803
PubMed | Europe PMC

Search this title in

Google Scholar