TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW (2009)
BMC Bioinformatics 10(1).

Download
OA
Journal Article | Published | English
Author
Abstract
Background: Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion: An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. Background:
Publishing Year
ISSN
PUB-ID

Cite this

Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10(1).
Diaz, N. N., Krause, L., Goesmann, A., Niehaus, K., & Nattkemper, T. W. (2009). TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics, 10(1).
Diaz, N. N., Krause, L., Goesmann, A., Niehaus, K., and Nattkemper, T. W. (2009). TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10.
Diaz, N.N., et al., 2009. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics, 10(1).
N.N. Diaz, et al., “TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach”, BMC Bioinformatics, vol. 10, 2009.
Diaz, N.N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T.W.: TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 10, (2009).
Diaz, Naryttza N., Krause, Lutz, Goesmann, Alexander, Niehaus, Karsten, and Nattkemper, Tim Wilhelm. “TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach”. BMC Bioinformatics 10.1 (2009).
Main File(s)
File Name
Access Level
OA Open Access
Last Uploaded
2015-12-22T10:33:01Z

This data publication is cited in the following publications:
This publication cites the following data publications:

59 Citations in Europe PMC

Data provided by Europe PubMed Central.

Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.
Langenkamper D, Jakobi T, Feld D, Jelonek L, Goesmann A, Nattkemper TW., Front Genet 7(), 2016
PMID: 26904094
Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data.
Aflitos SA, Severing E, Sanchez-Perez G, Peters S, de Jong H, de Ridder D., BMC Bioinformatics 16(), 2015
PMID: 26525298
FCMM: A comparative metagenomic approach for functional characterization of multiple metagenome samples.
Lee J, Lee HT, Hong WY, Jang E, Kim J., J. Microbiol. Methods 115(), 2015
PMID: 26027543
Exploiting topic modeling to boost metagenomic reads binning.
Zhang R, Cheng Z, Guan J, Zhou S., BMC Bioinformatics 16 Suppl 5(), 2015
PMID: 25859745
Classification of metagenomics data at lower taxonomic level using a robust supervised classifier.
Hou T, Liu F, Liu Y, Zou QY, Zhang X, Wang K., Evol. Bioinform. Online 11(), 2015
PMID: 25673967
MBBC: an efficient approach for metagenomic binning based on clustering.
Wang Y, Hu H, Li X., BMC Bioinformatics 16(), 2015
PMID: 25652152
A new vector for identification of prokaryotes and their variable-size genomes.
Hou T, Liu F, Lin CX, Li DY., Curr. Microbiol. 66(1), 2013
PMID: 23053493
Analysis of composition-based metagenomic classification.
Higashi S, Barreto Ada M, Cantao ME, de Vasconcelos AT., BMC Genomics 13 Suppl 5(), 2012
PMID: 23095761
From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.
Slabbinck B, Waegeman W, Dawyndt P, De Vos P, De Baets B., BMC Bioinformatics 11(), 2010
PMID: 20113515

42 References

Data provided by Europe PubMed Central.

In silico prediction of yeast deletion phenotypes.
Saha S, Heber S., Genet. Mol. Res. 5(1), 2006
PMID: 16755513
Using machine learning algorithms to guide rehabilitation planning for home care clients.
Zhu M, Zhang Z, Hirdes JP, Stolee P., BMC Med Inform Decis Mak 7(), 2007
PMID: 18096079
A vector space model for automatic indexing
Salton G, Wong A, Yang C., 1975
The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V., Nucleic Acids Res. 33(17), 2005
PMID: 16214803
Database resources of the National Center for Biotechnology Information: 2002 update.
Wheeler DL, Church DM, Lash AE, Leipe DD, Madden TL, Pontius JU, Schuler GD, Schriml LM, Tatusova TA, Wagner L, Rapp BA., Nucleic Acids Res. 30(1), 2002
PMID: 11752242
Investigations of oligonucleotide usage variance within and between prokaryotes.
Bohlin J, Skjerve E, Ussery DW., PLoS Comput. Biol. 4(4), 2008
PMID: 18421372
Characteristics of oligonucleotide frequencies across genomes: Conservation versus variation, strand symmetry, and evolutionary implications
Zhang SH, Ya-Zhi H., 2008
Ancient horizontal gene transfer.
Brown JR., Nat. Rev. Genet. 4(2), 2003
PMID: 12560809
Horizontal gene transfer in eukaryotic evolution.
Keeling PJ, Palmer JD., Nat. Rev. Genet. 9(8), 2008
PMID: 18591983
Horizontal gene transfer in prokaryotes: quantification and classification.
Koonin EV, Makarova KS, Aravind L., Annu. Rev. Microbiol. 55(), 2001
PMID: 11544372
DarkHorse: a method for genome-wide prediction of horizontal gene transfer.
Podell S, Gaasterland T., Genome Biol. 8(2), 2007
PMID: 17274820
The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum.
Ruepp A, Graml W, Santos-Martinez ML, Koretke KK, Volker C, Mewes HW, Frishman D, Stocker S, Lupas AN, Baumeister W., Nature 407(6803), 2000
PMID: 11029001
Horizontal gene transfer in bacterial and archaeal complete genomes.
Garcia-Vallve S, Romeu A, Palau J., Genome Res. 10(11), 2000
PMID: 11076857
Environments shape the nucleotide composition of genomes.
Foerstner KU, von Mering C, Hooper SD, Bork P., EMBO Rep. 6(12), 2005
PMID: 16200051
Assessing the accuracy of prediction algorithms for classification: an overview.
Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H., Bioinformatics 16(5), 2000
PMID: 10871264

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 19210774
PubMed | Europe PMC

Search this title in

Google Scholar