Exploiting single-molecule transcript sequencing for eukaryotic gene prediction

Minoche AE, Dohm JC, Schneider J, Holtgräwe D, Viehöver P, Montfort M, Rosleff Sörensen T, Weisshaar B, Himmelbauer H (2015)
Genome Biology 16(1): 184.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
OA 2.11 MB
Minoche, Andre E; Dohm, Juliane C; Schneider, JessicaUniBi; Holtgräwe, DanielaUniBi ; Viehöver, PriscaUniBi; Montfort, Magda; Rosleff Sörensen, ThomasUniBi; Weisshaar, BerndUniBi ; Himmelbauer, Heinz
Abstract / Bemerkung
We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.
Genome Biology
Page URI


Minoche AE, Dohm JC, Schneider J, et al. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biology. 2015;16(1): 184.
Minoche, A. E., Dohm, J. C., Schneider, J., Holtgräwe, D., Viehöver, P., Montfort, M., Rosleff Sörensen, T., et al. (2015). Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biology, 16(1), 184. doi:10.1186/s13059-015-0729-7
Minoche, Andre E, Dohm, Juliane C, Schneider, Jessica, Holtgräwe, Daniela, Viehöver, Prisca, Montfort, Magda, Rosleff Sörensen, Thomas, Weisshaar, Bernd, and Himmelbauer, Heinz. 2015. “Exploiting single-molecule transcript sequencing for eukaryotic gene prediction”. Genome Biology 16 (1): 184.
Minoche, A. E., Dohm, J. C., Schneider, J., Holtgräwe, D., Viehöver, P., Montfort, M., Rosleff Sörensen, T., Weisshaar, B., and Himmelbauer, H. (2015). Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biology 16:184.
Minoche, A.E., et al., 2015. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biology, 16(1): 184.
A.E. Minoche, et al., “Exploiting single-molecule transcript sequencing for eukaryotic gene prediction”, Genome Biology, vol. 16, 2015, : 184.
Minoche, A.E., Dohm, J.C., Schneider, J., Holtgräwe, D., Viehöver, P., Montfort, M., Rosleff Sörensen, T., Weisshaar, B., Himmelbauer, H.: Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biology. 16, : 184 (2015).
Minoche, Andre E, Dohm, Juliane C, Schneider, Jessica, Holtgräwe, Daniela, Viehöver, Prisca, Montfort, Magda, Rosleff Sörensen, Thomas, Weisshaar, Bernd, and Himmelbauer, Heinz. “Exploiting single-molecule transcript sequencing for eukaryotic gene prediction”. Genome Biology 16.1 (2015): 184.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Access Level
OA Open Access
Zuletzt Hochgeladen
MD5 Prüfsumme

33 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

SpinachBase: a central portal for spinach genomics.
Collins K, Zhao K, Jiao C, Xu C, Cai X, Wang X, Ge C, Dai S, Wang Q, Wang Q, Fei Z, Zheng Y., Database (Oxford) 2019(), 2019
PMID: 31211398
Gene Expression Program Underlying Tail Resorption During Thyroid Hormone-Dependent Metamorphosis of the Ornamented Pygmy Frog Microhyla fissipes.
Wang S, Liu L, Liu J, Zhu W, Tanizaki Y, Fu L, Bao L, Shi YB, Jiang J., Front Endocrinol (Lausanne) 10(), 2019
PMID: 30740088
Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing.
Zhao L, Zhang H, Kohnen MV, Prasad KVSK, Gu L, Reddy ASN., Front Genet 10(), 2019
PMID: 30949200
Transcriptome analysis based on a combination of sequencing platforms provides insights into leaf pigmentation in Acer rubrum.
Chen Z, Lu X, Xuan Y, Tang F, Wang J, Shi D, Fu S, Ren J., BMC Plant Biol 19(1), 2019
PMID: 31170934
Genome and transcriptome characterization of the glycoengineered Nicotiana benthamiana line ΔXT/FT.
Schiavinato M, Strasser R, Mach L, Dohm JC, Himmelbauer H., BMC Genomics 20(1), 2019
PMID: 31324144
A global survey of alternative splicing in allopolyploid cotton: landscape, complexity and regulation.
Wang M, Wang P, Liang F, Ye Z, Li J, Shen C, Pei L, Wang F, Hu J, Tu L, Lindsey K, He D, Zhang X., New Phytol 217(1), 2018
PMID: 28892169
Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes.
An D, Cao HX, Li C, Humbeck K, Wang W., Genes (Basel) 9(1), 2018
PMID: 29346292
Molecular Mechanisms of Acclimatization to Phosphorus Starvation and Recovery Underlying Full-Length Transcriptome Profiling in Barley (Hordeum vulgare L.).
Ren P, Meng Y, Li B, Ma X, Si E, Lai Y, Wang J, Yao L, Yang K, Shang X, Wang H., Front Plant Sci 9(), 2018
PMID: 29720989
High Quality de Novo Transcriptome Assembly of Croton tiglium.
Haak M, Vinke S, Keller W, Droste J, Rückert C, Kalinowski J, Pucker B., Front Mol Biosci 5(), 2018
PMID: 30027092
A survey of transcriptome complexity in Sus scrofa using single-molecule long-read sequencing.
Li Y, Fang C, Fu Y, Hu A, Li C, Zou C, Li X, Zhao S, Zhang C, Li C., DNA Res 25(4), 2018
PMID: 29850846
Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon.
Sahlin K, Tomaszkiewicz M, Makova KD, Medvedev P., Nat Commun 9(1), 2018
PMID: 30389934
Tissue-Based Mapping of the Fathead Minnow (Pimephales promelas) Transcriptome and Proteome.
Lavelle C, Smith LC, Bisesi JH, Yu F, Silva-Sanchez C, Moraga-Amador D, Buerger AN, Garcia-Reyero N, Sabo-Attwood T, Denslow ND., Front Endocrinol (Lausanne) 9(), 2018
PMID: 30459712
The dynamic landscape of fission yeast meiosis alternative-splice isoforms.
Kuang Z, Boeke JD, Canzar S., Genome Res 27(1), 2017
PMID: 27856494
Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis.
Weirather JL, de Cesare M, Wang Y, Piazza P, Sebastiano V, Wang XJ, Buck D, Au KF., F1000Res 6(), 2017
PMID: 28868132
Genome-wide analysis of complex wheat gliadins, the dominant carriers of celiac disease epitopes.
Wang DW, Li D, Wang J, Zhao Y, Wang Z, Yue G, Liu X, Qin H, Zhang K, Dong L, Wang D., Sci Rep 7(), 2017
PMID: 28300172
Newly developed SSR markers reveal genetic diversity and geographical clustering in spinach (Spinacia oleracea).
Göl Ş, Göktay M, Allmer J, Doğanlar S, Frary A., Mol Genet Genomics 292(4), 2017
PMID: 28386640
Draft genome of spinach and transcriptome diversity of 120 Spinacia accessions.
Xu C, Jiao C, Sun H, Cai X, Wang X, Ge C, Zheng Y, Liu W, Sun X, Xu Y, Deng J, Zhang Z, Huang S, Dai S, Mou B, Wang Q, Fei Z, Wang Q., Nat Commun 8(), 2017
PMID: 28537264
Crop wild relative populations of Beta vulgaris allow direct mapping of agronomically important genes.
Capistrano-Gossmann GG, Ries D, Holtgräwe D, Minoche A, Kraft T, Frerichmann SLM, Rosleff Soerensen T, Dohm JC, González I, Schilhabel M, Varrelmann M, Tschoep H, Uphoff H, Schütze K, Borchardt D, Toerjek O, Mechelke W, Lein JC, Schechert AW, Frese L, Himmelbauer H, Weisshaar B, Kopisch-Obuch FJ., Nat Commun 8(), 2017
PMID: 28585529
A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.
Chen SY, Deng F, Jia X, Li C, Lai SJ., Sci Rep 7(1), 2017
PMID: 28794490
Genome-wide identification and characterization of aquaporin gene family in Beta vulgaris.
Kong W, Yang S, Wang Y, Bendahmane M, Fu X., PeerJ 5(), 2017
PMID: 28948097
Genetic diversity and population structure analysis of spinach by single-nucleotide polymorphisms identified through genotyping-by-sequencing.
Shi A, Qin J, Mou B, Correll J, Weng Y, Brenner D, Feng C, Motes D, Yang W, Dong L, Bhattarai G, Ravelombola W., PLoS One 12(11), 2017
PMID: 29190770
Single-Molecule Long-Read Transcriptome Dataset of Halophyte Halogeton glomeratus.
Wang J, Yao L, Li B, Meng Y, Ma X, Wang H., Front Genet 8(), 2017
PMID: 29250103
Genetic diversity and association mapping of mineral element concentrations in spinach leaves.
Qin J, Shi A, Mou B, Grusak MA, Weng Y, Ravelombola W, Bhattarai G, Dong L, Yang W., BMC Genomics 18(1), 2017
PMID: 29202697
cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing.
Cartolano M, Huettel B, Hartwig B, Reinhardt R, Schneeberger K., PLoS One 11(6), 2016
PMID: 27327613
A survey of the sorghum transcriptome using single-molecule long reads.
Abdel-Ghany SE, Hamilton M, Jacobi JL, Ngam P, Devitt N, Schilkey F, Ben-Hur A, Reddy AS., Nat Commun 7(), 2016
PMID: 27339290
De novo and comparative transcriptome analysis of cultivated and wild spinach.
Xu C, Jiao C, Zheng Y, Sun H, Liu W, Cai X, Wang X, Liu S, Xu Y, Mou B, Dai S, Fei Z, Wang Q., Sci Rep 5(), 2015
PMID: 26635144
Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.
Dong L, Liu H, Zhang J, Yang S, Kong G, Chu JS, Chen N, Wang D., BMC Genomics 16(), 2015
PMID: 26645802

36 References

Daten bereitgestellt von Europe PubMed Central.

nGASP--the nematode genome annotation assessment project.
Coghlan A, Fiedler TJ, McKay SJ, Flicek P, Harris TW, Blasiar D; nGASP Consortium, Stein LD., BMC Bioinformatics 9(), 2008
PMID: 19099578
Real-time DNA sequencing from single polymerase molecules.
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S., Science 323(5910), 2008
PMID: 19023044
A single-molecule long-read survey of the human transcriptome.
Sharon D, Tilgner H, Grubert F, Snyder M., Nat. Biotechnol. 31(11), 2013
PMID: 24108091

Construction of a 'unigene' cDNA clone set by oligonucleotide fingerprinting allows access to 25 000 potential sugar beet genes.
Herwig R, Schulz B, Weisshaar B, Hennig S, Steinfath M, Drungowski M, Stahl D, Wruck W, Menze A, O'Brien J, Lehrach H, Radelof U., Plant J. 32(5), 2002
PMID: 12472698
Haplotype divergence in Beta vulgaris and microsynteny with sequenced plant genomes.
Dohm JC, Lange C, Reinhardt R, Himmelbauer H., Plant J. 57(1), 2008
PMID: 18764921
Palaeohexaploid ancestry for Caryophyllales inferred from extensive gene-based physical and genetic mapping of the sugar beet genome (Beta vulgaris).
Dohm JC, Lange C, Holtgrawe D, Sorensen TR, Borchardt D, Schulz B, Lehrach H, Weisshaar B, Himmelbauer H., Plant J. 70(3), 2012
PMID: 22211633
An antagonistic pair of FT homologs mediates the control of flowering time in sugar beet.
Pin PA, Benlloch R, Bonnet D, Wremerth-Weich E, Kraft T, Gielen JJ, Nilsson O., Science 330(6009), 2010
PMID: 21127254
The role of a pseudo-response regulator gene in life cycle adaptation and domestication of beet.
Pin PA, Zhang W, Vogt SH, Dally N, Buttner B, Schulze-Buxloh G, Jelly NS, Chia TY, Mutasa-Gottgens ES, Dohm JC, Himmelbauer H, Weisshaar B, Kraus J, Gielen JJ, Lommel M, Weyens G, Wahl B, Schechert A, Nilsson O, Jung C, Kraft T, Muller AE., Curr. Biol. 22(12), 2012
PMID: 22608508
The beet R locus encodes a new cytochrome P450 required for red betalain production.
Hatlestad GJ, Sunnadeniya RM, Akhavan NA, Gonzalez A, Goldman IL, McGrath JM, Lloyd AM., Nat. Genet. 44(7), 2012
PMID: 22660548
Differential expression patterns of non-symbiotic hemoglobins in sugar beet (Beta vulgaris ssp. vulgaris).
Leiva-Eriksson N, Pin PA, Kraft T, Dohm JC, Minoche AE, Himmelbauer H, Bulow L., Plant Cell Physiol. 55(4), 2014
PMID: 24486763
The genome of the recently domesticated crop plant sugar beet (Beta vulgaris).
Dohm JC, Minoche AE, Holtgrawe D, Capella-Gutierrez S, Zakrzewski F, Tafer H, Rupp O, Sorensen TR, Stracke R, Reinhardt R, Goesmann A, Kraft T, Schulz B, Stadler PF, Schmidt T, Gabaldon T, Lehrach H, Weisshaar B, Himmelbauer H., Nature 505(7484), 2013
PMID: 24352233
Gene prediction with a hidden Markov model and a new intron submodel.
Stanke M, Waack S., Bioinformatics 19 Suppl 2(), 2003
PMID: 14534192
Using native and syntenically mapped cDNA alignments to improve de novo gene finding.
Stanke M, Diekhans M, Baertsch R, Haussler D., Bioinformatics 24(5), 2008
PMID: 18218656
EGASP: the human ENCODE Genome Annotation Assessment Project.
Guigo R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG., Genome Biol. 7 Suppl 1(), 2006
PMID: 16925836
Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction.
Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD., BioTechniques 30(4), 2001
PMID: 11314272
Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells.
Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R., Nat. Biotechnol. 30(8), 2012
PMID: 22820318
proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.
Hackl T, Hedrich R, Schultz J, Forster F., Bioinformatics 30(21), 2014
PMID: 25015988
A genomic perspective on protein families.
Tatusov RL, Koonin EV, Lipman DJ., Science 278(5338), 1997
PMID: 9381173




GMAP: a genomic mapping and alignment program for mRNA and EST sequences.
Wu TD, Watanabe CK., Bioinformatics 21(9), 2005
PMID: 15728110

GenomeView: a next-generation genome browser.
Abeel T, Van Parys T, Saeys Y, Galagan J, Van de Peer Y., Nucleic Acids Res. 40(2), 2011
PMID: 22102585

BLAT--the BLAST-like alignment tool.
Kent WJ., Genome Res. 12(4), 2002
PMID: 11932250


The generic genome browser: a building block for a model organism system database.
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S., Genome Res. 12(10), 2002
PMID: 12368253
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694
Compilation of mRNA polyadenylation signals in Arabidopsis revealed a new signal element and potential secondary structures.
Loke JC, Stahlberg EA, Strenski DG, Haas BJ, Wood PC, Li QQ., Plant Physiol. 138(3), 2005
PMID: 15965016

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 26328666
PubMed | Europe PMC

Suchen in

Google Scholar