GISMO - gene identification using a support vector machine for ORF classification

Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F (2007)
Nucleic Acids Research 35(2): 540-549.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Krause, Lutz; McHardy, Alice C.; Nattkemper, Tim WilhelmUniBi ; Pühler, AlfredUniBi ; Stoye, JensUniBi ; Meyer, Folker
Abstract / Bemerkung
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.
Nucleic Acids Research
Page URI


Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F. GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research. 2007;35(2):540-549.
Krause, L., McHardy, A. C., Nattkemper, T. W., Pühler, A., Stoye, J., & Meyer, F. (2007). GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research, 35(2), 540-549.
Krause, Lutz, McHardy, Alice C., Nattkemper, Tim Wilhelm, Pühler, Alfred, Stoye, Jens, and Meyer, Folker. 2007. “GISMO - gene identification using a support vector machine for ORF classification”. Nucleic Acids Research 35 (2): 540-549.
Krause, L., McHardy, A. C., Nattkemper, T. W., Pühler, A., Stoye, J., and Meyer, F. (2007). GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research 35, 540-549.
Krause, L., et al., 2007. GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research, 35(2), p 540-549.
L. Krause, et al., “GISMO - gene identification using a support vector machine for ORF classification”, Nucleic Acids Research, vol. 35, 2007, pp. 540-549.
Krause, L., McHardy, A.C., Nattkemper, T.W., Pühler, A., Stoye, J., Meyer, F.: GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research. 35, 540-549 (2007).
Krause, Lutz, McHardy, Alice C., Nattkemper, Tim Wilhelm, Pühler, Alfred, Stoye, Jens, and Meyer, Folker. “GISMO - gene identification using a support vector machine for ORF classification”. Nucleic Acids Research 35.2 (2007): 540-549.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Access Level
OA Open Access
Zuletzt Hochgeladen
MD5 Prüfsumme

23 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Systems and synthetic biology perspective of the versatile plant-pathogenic and polysaccharide-producing bacterium Xanthomonas campestris.
Schatschneider S, Schneider J, Blom J, Létisse F, Niehaus K, Goesmann A, Vorhölter FJ., Microbiology 163(8), 2017
PMID: 28795660
Endophytic Streptomyces in the traditional medicinal plant Arnica montana L.: secondary metabolites and biological activity.
Wardecki T, Brötz E, De Ford C, von Loewenich FD, Rebets Y, Tokovenko B, Luzhetskyy A, Merfort I., Antonie Van Leeuwenhoek 108(2), 2015
PMID: 26036671
Complete genome sequence of producer of the glycopeptide antibiotic Aculeximycin Kutzneria albida DSM 43870T, a representative of minor genus of Pseudonocardiaceae.
Rebets Y, Tokovenko B, Lushchyk I, Rückert C, Zaburannyi N, Bechthold A, Kalinowski J, Luzhetskyy A., BMC Genomics 15(), 2014
PMID: 25301375
Genomes of Stigonematalean cyanobacteria (subsection V) and the evolution of oxygenic photosynthesis from prokaryotes to plastids.
Dagan T, Roettger M, Stucken K, Landan G, Koch R, Major P, Gould SB, Goremykin VV, Rippka R, Tandeau de Marsac N, Gugger M, Lockhart PJ, Allen JF, Brune I, Maus I, Pühler A, Martin WF., Genome Biol Evol 5(1), 2013
PMID: 23221676
Gene prediction in metagenomic fragments based on the SVM algorithm.
Liu Y, Guo J, Hu G, Zhu H., BMC Bioinformatics 14 Suppl 5(), 2013
PMID: 23735199
The noncanonical type III secretion system of Xanthomonas translucens pv. graminis is essential for forage grass infection.
Wichmann F, Vorhölter FJ, Hersemann L, Widmer F, Blom J, Niehaus K, Reinhard S, Conradin C, Kölliker R., Mol Plant Pathol 14(6), 2013
PMID: 23578314
The complete genome sequence of the acarbose producer Actinoplanes sp. SE50/110.
Schwientek P, Szczepanowski R, Rückert C, Kalinowski J, Klein A, Selber K, Wehmeier UF, Stoye J, Pühler A., BMC Genomics 13(), 2012
PMID: 22443545
Complete genome sequence of Saccharothrix espanaensis DSM 44229(T) and comparison to the other completely sequenced Pseudonocardiaceae.
Strobel T, Al-Dilaimi A, Blom J, Gessner A, Kalinowski J, Luzhetska M, Pühler A, Szczepanowski R, Bechthold A, Rückert C., BMC Genomics 13(), 2012
PMID: 22958348
Genome sequence of the bacterium Streptomyces davawensis JCM 4913 and heterologous production of the unique antibiotic roseoflavin.
Jankowitsch F, Schwarz J, Rückert C, Gust B, Szczepanowski R, Blom J, Pelzer S, Kalinowski J, Mack M., J Bacteriol 194(24), 2012
PMID: 23043000
Gene discovery by genome-wide CDS re-prediction and microarray-based transcriptional analysis in phytopathogen Xanthomonas campestris.
Zhou L, Vorhölter FJ, He YQ, Jiang BL, Tang JL, Xu Y, Pühler A, He YW., BMC Genomics 12(), 2011
PMID: 21745409
SpliceIT: a hybrid method for splice signal identification based on probabilistic and biological inference.
Malousi A, Chouvarda I, Koutkias V, Kouidou S, Maglaveras N., J Biomed Inform 43(2), 2010
PMID: 19800027
Analysis of high-throughput sequencing and annotation strategies for phage genomes.
Henn MR, Sullivan MB, Stange-Thomann N, Osburne MS, Berlin AM, Kelly L, Yandava C, Kodira C, Zeng Q, Weiand M, Sparrow T, Saif S, Giannoukos G, Young SK, Nusbaum C, Birren BW, Chisholm SW., PLoS One 5(2), 2010
PMID: 20140207
A genomic survey of positive selection in Burkholderia pseudomallei provides insights into the evolution of accidental virulence.
Nandi T, Ong C, Singh AP, Boddey J, Atkins T, Sarkar-Tyson M, Essex-Lopresti AE, Chua HH, Pearson T, Kreisberg JF, Nilsson C, Ariyaratne P, Ronning C, Losada L, Ruan Y, Sung WK, Woods D, Titball RW, Beacham I, Peak I, Keim P, Nierman WC, Tan P., PLoS Pathog 6(4), 2010
PMID: 20368977
Genome sequence of Desulfobacterium autotrophicum HRM2, a marine sulfate reducer oxidizing organic carbon completely to carbon dioxide.
Strittmatter AW, Liesegang H, Rabus R, Decker I, Amann J, Andres S, Henne A, Fricke WF, Martinez-Arias R, Bartels D, Goesmann A, Krause L, Pühler A, Klenk HP, Richter M, Schüler M, Glöckner FO, Meyerdierks A, Gottschalk G, Amann R., Environ Microbiol 11(5), 2009
PMID: 19187283
Integrated application of uniform design and least-squares support vector machines to transfection optimization.
Pan JS, Hong MZ, Zhou QF, Cai JY, Wang HZ, Luo LK, Yang DQ, Dong J, Shi HX, Ren JL., BMC Biotechnol 9(), 2009
PMID: 19480716
The genome of Xanthomonas campestris pv. campestris B100 and its use for the reconstruction of metabolic pathways involved in xanthan biosynthesis.
Vorhölter FJ, Schneiker S, Goesmann A, Krause L, Bekel T, Kaiser O, Linke B, Patschkowski T, Rückert C, Schmid J, Sidhu VK, Sieber V, Tauch A, Watt SA, Weisshaar B, Becker A, Niehaus K, Pühler A., J Biotechnol 134(1-2), 2008
PMID: 18304669
The lifestyle of Corynebacterium urealyticum derived from its complete genome sequence established by pyrosequencing.
Tauch A, Trost E, Tilker A, Ludewig U, Schneiker S, Goesmann A, Arnold W, Bekel T, Brinkrolf K, Brune I, Götker S, Kalinowski J, Kamp PB, Lobo FP, Viehoever P, Weisshaar B, Soriano F, Dröge M, Pühler A., J Biotechnol 136(1-2), 2008
PMID: 18367281
Efficient computation of absent words in genomic sequences.
Herold J, Kurtz S, Giegerich R., BMC Bioinformatics 9(), 2008
PMID: 18366790
The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology.
Schlüter A, Bekel T, Diaz NN, Dondrup M, Eichenlaub R, Gartemann KH, Krahn I, Krause L, Krömeke H, Kruse O, Mussgnug JH, Neuweger H, Niehaus K, Pühler A, Runte KJ, Szczepanowski R, Tauch A, Tilker A, Viehöver P, Goesmann A., J Biotechnol 136(1-2), 2008
PMID: 18597880
CoryneCenter - an online resource for the integrated analysis of corynebacterial genome and transcriptome data.
Neuweger H, Baumbach J, Albaum S, Bekel T, Dondrup M, Hüser AT, Kalinowski J, Oehm S, Pühler A, Rahmann S, Weile J, Goesmann A., BMC Syst Biol 1(), 2007
PMID: 18034885

46 References

Daten bereitgestellt von Europe PubMed Central.

Microbial gene identification using interpolated Markov models.
Salzberg SL, Delcher AL, Kasif S, White O., Nucleic Acids Res. 26(2), 1998
PMID: 9421513
CRITICA: coding region identification tool invoking comparative analysis.
Badger JH, Olsen GJ, Woese CR., Mol. Biol. Evol. 16(4), 1999
PMID: 10331277
Improved microbial gene identification with GLIMMER.
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL., Nucleic Acids Res. 27(23), 1999
PMID: 10556321
Combining diverse evidence for gene recognition in completely sequenced bacterial genomes.
Frishman D, Mironov A, Mewes HW, Gelfand M., Nucleic Acids Res. 26(12), 1998
PMID: 9611239
EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance.
Larsen TS, Krogh A., BMC Bioinformatics 4(), 2003
PMID: 12783628
GeneMark.hmm: new solutions for gene finding.
Lukashin AV, Borodovsky M., Nucleic Acids Res. 26(4), 1998
PMID: 9461475
Dictionary-driven prokaryotic gene finding.
Shibuya T, Rigoutsos I., Nucleic Acids Res. 30(12), 2002
PMID: 12060689
Gene identification in novel eukaryotic genomes by self-training algorithm.
Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M., Nucleic Acids Res. 33(20), 2005
PMID: 16314312
Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models.
Mahony S, McInerney JO, Smith TJ, Golden A., BMC Bioinformatics 5(), 2004
PMID: 15070404
Development of joint application strategies for two microbial gene finders.
McHardy AC, Goesmann A, Puhler A, Meyer F., Bioinformatics 20(10), 2004
PMID: 14988122
YACOP: Enhanced gene prediction obtained by a combination of existing methods.
Tech M, Merkl R., In Silico Biol. (Gedrukt) 3(4), 2003
PMID: 14965344
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.
Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C., J. Mol. Biol. 284(4), 1998
PMID: 9837738

Boser B., Guyon I., Vapnik V.N.., 1992

Vapnik V.N.., 1995
Efficient remote homology detection using local structure.
Hou Y, Hsu W, Lee ML, Bystroff C., Bioinformatics 19(17), 2003
PMID: 14630658
A discriminative framework for detecting remote protein homologies.
Jaakkola T, Diekhans M, Haussler D., J. Comput. Biol. 7(1-2), 2000
PMID: 10890390
The spectrum kernel: a string kernel for SVM protein classification.
Leslie C, Eskin E, Noble WS., Pac Symp Biocomput (), 2002
PMID: 11928508
Knowledge-based analysis of microarray gene expression data by using support vector machines.
Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M Jr, Haussler D., Proc. Natl. Acad. Sci. U.S.A. 97(1), 2000
PMID: 10618406
Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases.
Lafay B, Lloyd AT, McLean MJ, Devine KM, Sharp PM, Wolfe KH., Nucleic Acids Res. 27(7), 1999
PMID: 10075995
Evidence for horizontal gene transfer in Escherichia coli speciation.
Medigue C, Rouxel T, Vigier P, Henaut A, Danchin A., J. Mol. Biol. 222(4), 1991
PMID: 1762151
Codon usage and lateral gene transfer in Bacillus subtilis.
Moszer I, Rocha EP, Danchin A., Curr. Opin. Microbiol. 2(5), 1999
PMID: 10508724
REGANOR: a gene prediction server for prokaryotic genomes and a database of high quality gene predictions for prokaryotes.
Linke B, McHardy AC, Neuweger H, Krause L, Meyer F., Appl. Bioinformatics 5(3), 2006
PMID: 16922601
EMBL Nucleotide Sequence Database: developments in 2005.
Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, van den Broek A, Castro M, Duggan K, Eberhardt R, Faruque N, Gamble J, Kanz C, Kulikova T, Lee C, Leinonen R, Lin Q, Lombard V, Lopez R, McHale M, McWilliam H, Mukherjee G, Nardone F, Pastor MP, Sobhany S, Stoehr P, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381823
HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes.
Garcia-Vallve S, Guzman E, Montero MA, Romeu A., Nucleic Acids Res. 31(1), 2003
PMID: 12519978
The Pfam protein families database.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681378

Schoelkopf A., Schmola J.., 2002

Hastie T., Tibshirani R., Friedman J.H.., 2003
GS-Finder: a program to find bacterial gene start sites with a self-training method.
Ou HY, Guo FB, Zhang CT., Int. J. Biochem. Cell Biol. 36(3), 2004
PMID: 14687930
Measuring the accuracy of diagnostic systems.
Swets JA., Science 240(4857), 1988
PMID: 3287615
Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.
Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H., DNA Res. 8(1), 2001
PMID: 11258796
Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39.
Read TD, Brunham RC, Shen C, Gill SR, Heidelberg JF, White O, Hickey EK, Peterson J, Utterback T, Berry K, Bass S, Linher K, Weidman J, Khouri H, Craven B, Bowman C, Dodson R, Gwinn M, Nelson W, DeBoy R, Kolonay J, McClarty G, Salzberg SL, Eisen J, Fraser CM., Nucleic Acids Res. 28(6), 2000
PMID: 10684935
Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp
Shigenobu S., Watanabe H., Hattori M., Sakaki Y., Ishikawa H.., 2000
Missing genes in metabolic pathways: a comparative genomics approach.
Osterman A, Overbeek R., Curr Opin Chem Biol 7(2), 2003
PMID: 12714058
The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V., Nucleic Acids Res. 33(17), 2005
PMID: 16214803
Profile hidden Markov models.
Eddy SR., Bioinformatics 14(9), 1998
PMID: 9918945
The Bioperl toolkit: Perl modules for the life sciences.
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E., Genome Res. 12(10), 2002
PMID: 12368254
On the total number of genes and their length distribution in complete microbial genomes.
Skovgaard M, Jensen LJ, Brunak S, Ussery D, Krogh A., Trends Genet. 17(8), 2001
PMID: 11485798
Large-scale prokaryotic gene prediction and comparison to genome annotation.
Nielsen P, Krogh A., Bioinformatics 21(24), 2005
PMID: 16249266
Lateral gene transfer and the nature of bacterial innovation.
Ochman H, Lawrence JG, Groisman EA., Nature 405(6784), 2000
PMID: 10830951
Thiamin biosynthesis in prokaryotes.
Begley TP, Downs DM, Ealick SE, McLafferty FW, Van Loon AP, Taylor S, Campobasso N, Chiu HJ, Kinsland C, Reddick JJ, Xi J., Arch. Microbiol. 171(5), 1999
PMID: 10382260
Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms.
Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS., J. Biol. Chem. 277(50), 2002
PMID: 12376536

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 17175534
PubMed | Europe PMC

Suchen in

Google Scholar