GISMO - gene identification using a support vector machine for ORF classification

Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F (2007)
Nucleic Acids Research 35(2): 540-549.

Journal Article | Original Article | Published | English
; ; ; ; ;
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.
Publishing Year

Cite this

Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F. GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research. 2007;35(2):540-549.
Krause, L., McHardy, A. C., Nattkemper, T. W., Pühler, A., Stoye, J., & Meyer, F. (2007). GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research, 35(2), 540-549. doi:10.1093/nar/gkl1083
Krause, L., McHardy, A. C., Nattkemper, T. W., Pühler, A., Stoye, J., and Meyer, F. (2007). GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research 35, 540-549.
Krause, L., et al., 2007. GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research, 35(2), p 540-549.
L. Krause, et al., “GISMO - gene identification using a support vector machine for ORF classification”, Nucleic Acids Research, vol. 35, 2007, pp. 540-549.
Krause, L., McHardy, A.C., Nattkemper, T.W., Pühler, A., Stoye, J., Meyer, F.: GISMO - gene identification using a support vector machine for ORF classification. Nucleic Acids Research. 35, 540-549 (2007).
Krause, Lutz, McHardy, Alice C., Nattkemper, Tim Wilhelm, Pühler, Alfred, Stoye, Jens, and Meyer, Folker. “GISMO - gene identification using a support vector machine for ORF classification”. Nucleic Acids Research 35.2 (2007): 540-549.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

22 Citations in Europe PMC

Data provided by Europe PubMed Central.

Endophytic Streptomyces in the traditional medicinal plant Arnica montana L.: secondary metabolites and biological activity.
Wardecki T, Brotz E, De Ford C, von Loewenich FD, Rebets Y, Tokovenko B, Luzhetskyy A, Merfort I., Antonie Van Leeuwenhoek 108(2), 2015
PMID: 26036671
Complete genome sequence of producer of the glycopeptide antibiotic Aculeximycin Kutzneria albida DSM 43870T, a representative of minor genus of Pseudonocardiaceae.
Rebets Y, Tokovenko B, Lushchyk I, Ruckert C, Zaburannyi N, Bechthold A, Kalinowski J, Luzhetskyy A., BMC Genomics 15(), 2014
PMID: 25301375
Gene prediction in metagenomic fragments based on the SVM algorithm.
Liu Y, Guo J, Hu G, Zhu H., BMC Bioinformatics 14 Suppl 5(), 2013
PMID: 23735199
The noncanonical type III secretion system of Xanthomonas translucens pv. graminis is essential for forage grass infection.
Wichmann F, Vorholter FJ, Hersemann L, Widmer F, Blom J, Niehaus K, Reinhard S, Conradin C, Kolliker R., Mol. Plant Pathol. 14(6), 2013
PMID: 23578314
Genomes of Stigonematalean cyanobacteria (subsection V) and the evolution of oxygenic photosynthesis from prokaryotes to plastids.
Dagan T, Roettger M, Stucken K, Landan G, Koch R, Major P, Gould SB, Goremykin VV, Rippka R, Tandeau de Marsac N, Gugger M, Lockhart PJ, Allen JF, Brune I, Maus I, Puhler A, Martin WF., Genome Biol Evol 5(1), 2013
PMID: 23221676
Genome sequence of the bacterium Streptomyces davawensis JCM 4913 and heterologous production of the unique antibiotic roseoflavin.
Jankowitsch F, Schwarz J, Ruckert C, Gust B, Szczepanowski R, Blom J, Pelzer S, Kalinowski J, Mack M., J. Bacteriol. 194(24), 2012
PMID: 23043000
Complete genome sequence of Saccharothrix espanaensis DSM 44229(T) and comparison to the other completely sequenced Pseudonocardiaceae.
Strobel T, Al-Dilaimi A, Blom J, Gessner A, Kalinowski J, Luzhetska M, Puhler A, Szczepanowski R, Bechthold A, Ruckert C., BMC Genomics 13(), 2012
PMID: 22958348
The complete genome sequence of the acarbose producer Actinoplanes sp. SE50/110.
Schwientek P, Szczepanowski R, Ruckert C, Kalinowski J, Klein A, Selber K, Wehmeier UF, Stoye J, Puhler A., BMC Genomics 13(), 2012
PMID: 22443545
Gene discovery by genome-wide CDS re-prediction and microarray-based transcriptional analysis in phytopathogen Xanthomonas campestris.
Zhou L, Vorholter FJ, He YQ, Jiang BL, Tang JL, Xu Y, Puhler A, He YW., BMC Genomics 12(), 2011
PMID: 21745409
A genomic survey of positive selection in Burkholderia pseudomallei provides insights into the evolution of accidental virulence.
Nandi T, Ong C, Singh AP, Boddey J, Atkins T, Sarkar-Tyson M, Essex-Lopresti AE, Chua HH, Pearson T, Kreisberg JF, Nilsson C, Ariyaratne P, Ronning C, Losada L, Ruan Y, Sung WK, Woods D, Titball RW, Beacham I, Peak I, Keim P, Nierman WC, Tan P., PLoS Pathog. 6(4), 2010
PMID: 20368977
Analysis of high-throughput sequencing and annotation strategies for phage genomes.
Henn MR, Sullivan MB, Stange-Thomann N, Osburne MS, Berlin AM, Kelly L, Yandava C, Kodira C, Zeng Q, Weiand M, Sparrow T, Saif S, Giannoukos G, Young SK, Nusbaum C, Birren BW, Chisholm SW., PLoS ONE 5(2), 2010
PMID: 20140207
SpliceIT: a hybrid method for splice signal identification based on probabilistic and biological inference.
Malousi A, Chouvarda I, Koutkias V, Kouidou S, Maglaveras N., J Biomed Inform 43(2), 2010
PMID: 19800027
Integrated application of uniform design and least-squares support vector machines to transfection optimization.
Pan JS, Hong MZ, Zhou QF, Cai JY, Wang HZ, Luo LK, Yang DQ, Dong J, Shi HX, Ren JL., BMC Biotechnol. 9(), 2009
PMID: 19480716
Genome sequence of Desulfobacterium autotrophicum HRM2, a marine sulfate reducer oxidizing organic carbon completely to carbon dioxide.
Strittmatter AW, Liesegang H, Rabus R, Decker I, Amann J, Andres S, Henne A, Fricke WF, Martinez-Arias R, Bartels D, Goesmann A, Krause L, Puhler A, Klenk HP, Richter M, Schuler M, Glockner FO, Meyerdierks A, Gottschalk G, Amann R., Environ. Microbiol. 11(5), 2009
PMID: 19187283
A serum metabolomic investigation on hepatocellular carcinoma patients by chemical derivatization followed by gas chromatography/mass spectrometry.
Xue R, Lin Z, Deng C, Dong L, Liu T, Wang J, Shen X., Rapid Commun. Mass Spectrom. 22(19), 2008
PMID: 18767022
The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology.
Schluter A, Bekel T, Diaz NN, Dondrup M, Eichenlaub R, Gartemann KH, Krahn I, Krause L, Kromeke H, Kruse O, Mussgnug JH, Neuweger H, Niehaus K, Puhler A, Runte KJ, Szczepanowski R, Tauch A, Tilker A, Viehover P, Goesmann A., J. Biotechnol. 136(1-2), 2008
PMID: 18597880
The lifestyle of Corynebacterium urealyticum derived from its complete genome sequence established by pyrosequencing.
Tauch A, Trost E, Tilker A, Ludewig U, Schneiker S, Goesmann A, Arnold W, Bekel T, Brinkrolf K, Brune I, Gotker S, Kalinowski J, Kamp PB, Lobo FP, Viehoever P, Weisshaar B, Soriano F, Droge M, Puhler A., J. Biotechnol. 136(1-2), 2008
PMID: 18367281
Efficient computation of absent words in genomic sequences.
Herold J, Kurtz S, Giegerich R., BMC Bioinformatics 9(), 2008
PMID: 18366790
The genome of Xanthomonas campestris pv. campestris B100 and its use for the reconstruction of metabolic pathways involved in xanthan biosynthesis.
Vorholter FJ, Schneiker S, Goesmann A, Krause L, Bekel T, Kaiser O, Linke B, Patschkowski T, Ruckert C, Schmid J, Sidhu VK, Sieber V, Tauch A, Watt SA, Weisshaar B, Becker A, Niehaus K, Puhler A., J. Biotechnol. 134(1-2), 2008
PMID: 18304669
CoryneCenter - an online resource for the integrated analysis of corynebacterial genome and transcriptome data.
Neuweger H, Baumbach J, Albaum S, Bekel T, Dondrup M, Huser AT, Kalinowski J, Oehm S, Puhler A, Rahmann S, Weile J, Goesmann A., BMC Syst Biol 1(), 2007
PMID: 18034885

46 References

Data provided by Europe PubMed Central.

EMBL Nucleotide Sequence Database: developments in 2005.
Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, van den Broek A, Castro M, Duggan K, Eberhardt R, Faruque N, Gamble J, Kanz C, Kulikova T, Lee C, Leinonen R, Lin Q, Lombard V, Lopez R, McHale M, McWilliam H, Mukherjee G, Nardone F, Pastor MP, Sobhany S, Stoehr P, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381823
HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes.
Garcia-Vallve S, Guzman E, Montero MA, Romeu A., Nucleic Acids Res. 31(1), 2003
PMID: 12519978
The Pfam protein families database.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681378

Schoelkopf A., Schmola J.., 2002

Hastie T., Tibshirani R., Friedman J.H.., 2003
GS-Finder: a program to find bacterial gene start sites with a self-training method.
Ou HY, Guo FB, Zhang CT., Int. J. Biochem. Cell Biol. 36(3), 2004
PMID: 14687930
Measuring the accuracy of diagnostic systems.
Swets JA., Science 240(4857), 1988
PMID: 3287615
Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.
Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H., DNA Res. 8(1), 2001
PMID: 11258796
Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39.
Read TD, Brunham RC, Shen C, Gill SR, Heidelberg JF, White O, Hickey EK, Peterson J, Utterback T, Berry K, Bass S, Linher K, Weidman J, Khouri H, Craven B, Bowman C, Dodson R, Gwinn M, Nelson W, DeBoy R, Kolonay J, McClarty G, Salzberg SL, Eisen J, Fraser CM., Nucleic Acids Res. 28(6), 2000
PMID: 10684935
Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp
Shigenobu S., Watanabe H., Hattori M., Sakaki Y., Ishikawa H.., 2000
Missing genes in metabolic pathways: a comparative genomics approach.
Osterman A, Overbeek R., Curr Opin Chem Biol 7(2), 2003
PMID: 12714058
The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V., Nucleic Acids Res. 33(17), 2005
PMID: 16214803
Profile hidden Markov models.
Eddy SR., Bioinformatics 14(9), 1998
PMID: 9918945
The Bioperl toolkit: Perl modules for the life sciences.
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E., Genome Res. 12(10), 2002
PMID: 12368254
On the total number of genes and their length distribution in complete microbial genomes.
Skovgaard M, Jensen LJ, Brunak S, Ussery D, Krogh A., Trends Genet. 17(8), 2001
PMID: 11485798
Large-scale prokaryotic gene prediction and comparison to genome annotation.
Nielsen P, Krogh A., Bioinformatics 21(24), 2005
PMID: 16249266
Lateral gene transfer and the nature of bacterial innovation.
Ochman H, Lawrence JG, Groisman EA., Nature 405(6784), 2000
PMID: 10830951
Thiamin biosynthesis in prokaryotes.
Begley TP, Downs DM, Ealick SE, McLafferty FW, Van Loon AP, Taylor S, Campobasso N, Chiu HJ, Kinsland C, Reddick JJ, Xi J., Arch. Microbiol. 171(5), 1999
PMID: 10382260
Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms.
Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS., J. Biol. Chem. 277(50), 2002
PMID: 12376536


0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®


PMID: 17175534
PubMed | Europe PMC

Search this title in

Google Scholar