Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence

Pucker B, Holtgräwe D, Weisshaar B (2017)
BMC Research Notes 10(1): 667.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
OA 1.23 MB
Abstract / Bemerkung
Abstract Objective The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. Results Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11’s information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality.
Stichworte
Genome annotation; Splicing; Araport11; Gene prediction hints; Reciprocal best hit
Erscheinungsjahr
2017
Zeitschriftentitel
BMC Research Notes
Band
10
Ausgabe
1
Art.-Nr.
667
ISSN
1756-0500
eISSN
1756-0500
Finanzierungs-Informationen
Open-Access-Publikationskosten wurden durch die Deutsche Forschungsgemeinschaft und die Universität Bielefeld gefördert.
Page URI
https://pub.uni-bielefeld.de/record/2915524

Zitieren

Pucker B, Holtgräwe D, Weisshaar B. Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. BMC Research Notes. 2017;10(1): 667.
Pucker, B., Holtgräwe, D., & Weisshaar, B. (2017). Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. BMC Research Notes, 10(1), 667. doi:10.1186/s13104-017-2985-y
Pucker, Boas, Holtgräwe, Daniela, and Weisshaar, Bernd. 2017. “Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence”. BMC Research Notes 10 (1): 667.
Pucker, B., Holtgräwe, D., and Weisshaar, B. (2017). Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. BMC Research Notes 10:667.
Pucker, B., Holtgräwe, D., & Weisshaar, B., 2017. Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. BMC Research Notes, 10(1): 667.
B. Pucker, D. Holtgräwe, and B. Weisshaar, “Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence”, BMC Research Notes, vol. 10, 2017, : 667.
Pucker, B., Holtgräwe, D., Weisshaar, B.: Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. BMC Research Notes. 10, : 667 (2017).
Pucker, Boas, Holtgräwe, Daniela, and Weisshaar, Bernd. “Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence”. BMC Research Notes 10.1 (2017): 667.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-25T06:48:36Z
MD5 Prüfsumme
0c2f23b754cc803704fc7b0b81f24fd0


Link(s) zu Volltext(en)
Access Level
OA Open Access

2 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set.
Pucker B, Holtgräwe D, Stadermann KB, Frey K, Huettel B, Reinhardt R, Weisshaar B., PLoS One 14(5), 2019
PMID: 31112551

66 References

Daten bereitgestellt von Europe PubMed Central.

Why genes in pieces?
Gilbert W., Nature 271(5645), 1978
PMID: 622185
Organization and expression of eucaryotic split genes coding for proteins.
Breathnach R, Chambon P., Annu. Rev. Biochem. 50(), 1981
PMID: 6791577
Ovalbumin gene: evidence for a leader sequence in mRNA and DNA sequences at the exon-intron boundaries.
Breathnach R, Benoist C, O'Hare K, Gannon F, Chambon P., Proc. Natl. Acad. Sci. U.S.A. 75(10), 1978
PMID: 283395
A reappraisal of non-consensus mRNA splice sites.
Jackson IJ., Nucleic Acids Res. 19(14), 1991
PMID: 1713664
Terminal intron dinucleotide sequences do not distinguish between U2- and U12-dependent introns.
Dietrich RC, Incorvaia R, Padgett RA., Mol. Cell 1(1), 1997
PMID: 9659912
The splicing of U12-type introns can be a rate-limiting step in gene expression.
Patel AA, McCarthy M, Steitz JA., EMBO J. 21(14), 2002
PMID: 12110592
Analysis of canonical and non-canonical splice sites in mammalian genomes.
Burset M, Seledtsov IA, Solovyev VV., Nucleic Acids Res. 28(21), 2000
PMID: 11058137
Role of the 3' splice site in U12-dependent intron splicing.
Dietrich RC, Peris MJ, Seyboldt AS, Padgett RA., Mol. Cell. Biol. 21(6), 2001
PMID: 11238930
Comparison of splice sites in mammals and chicken.
Abril JF, Castelo R, Guigo R., Genome Res. 15(1), 2004
PMID: 15590946
A conserved unusual posttranscriptional processing mediated by short, direct repeated (SDR) sequences in plants.
Niu X, Luo D, Gao S, Ren G, Chang L, Zhou Y, Luo X, Li Y, Hou P, Tang W, Lu BR, Liu Y., J Genet Genomics 37(1), 2010
PMID: 20171581
Classification of introns: U2-type or U12-type.
Sharp PA, Burge CB., Cell 91(7), 1997
PMID: 9428511
Lessons from non-canonical splicing.
Sibley CR, Blazquez L, Ule J., Nat. Rev. Genet. 17(7), 2016
PMID: 27240813
Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping.
Zhu W, Schlueter SD, Brendel V., Plant Physiol. 132(2), 2003
PMID: 12805580
Determinants of plant U12-dependent intron splicing efficiency.
Lewandowska D, Simpson CG, Clark GP, Jennings NS, Barciszewska-Pacak M, Lin CF, Makalowski W, Brown JW, Jarmolowski A., Plant Cell 16(5), 2004
PMID: 15100401
ERISdb: a database of plant splice sites and splicing signals.
Szczesniak MW, Kabza M, Pokrzywa R, Gudys A, Makalowska I., Plant Cell Physiol. 54(2), 2013
PMID: 23299413
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
Arabidopsis Genome Initiative., Nature 408(6814), 2000
PMID: 11130711
Recent advances in gene structure prediction.
Brent MR, Guigo R., Curr. Opin. Struct. Biol. 14(3), 2004
PMID: 15193305
A comparative analysis of soft computing techniques for gene prediction.
Goel N, Singh S, Aseri TC., Anal. Biochem. 438(1), 2013
PMID: 23529114
Araport: the Arabidopsis information portal.
Krishnakumar V, Hanlon MR, Contrino S, Ferlanti ES, Karamycheva S, Kim M, Rosen BD, Cheng CY, Moreira W, Mock SA, Stubbs J, Sullivan JM, Krampis K, Miller JR, Micklem G, Vaughn M, Town CD., Nucleic Acids Res. 43(Database issue), 2014
PMID: 25414324
Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.
Cheng CY, Krishnakumar V, Chan AP, Thibaud-Nissen F, Schobel S, Town CD., Plant J. 89(4), 2017
PMID: 27862469
A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny.
Pucker B, Holtgrawe D, Rosleff Sorensen T, Stracke R, Viehover P, Weisshaar B., PLoS ONE 11(10), 2016
PMID: 27711162
OrthoMCL: identification of ortholog groups for eukaryotic genomes.
Li L, Stoeckert CJ Jr, Roos DS., Genome Res. 13(9), 2003
PMID: 12952885
Choosing BLAST options for better detection of orthologs as reciprocal best hits.
Moreno-Hagelsieb G, Latimer K., Bioinformatics 24(3), 2007
PMID: 18042555
A genomic perspective on protein families.
Tatusov RL, Koonin EV, Lipman DJ., Science 278(5338), 1997
PMID: 9381173
STAR: ultrafast universal RNA-seq aligner.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR., Bioinformatics 29(1), 2012
PMID: 23104886
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694
Genome-wide identification and characterisation of R2R3-MYB genes in sugar beet (Beta vulgaris).
Stracke R, Holtgrawe D, Schneider J, Pucker B, Sorensen TR, Weisshaar B., BMC Plant Biol. 14(), 2014
PMID: 25249410
Use of mutants from T-DNA insertion populations generated by high-throughput screening
Stracke R, Huep G, Weisshaar B., 2010
BLAT--the BLAST-like alignment tool.
Kent WJ., Genome Res. 12(4), 2002
PMID: 11932250
Gene prediction with a hidden Markov model and a new intron submodel.
Stanke M, Waack S., Bioinformatics 19 Suppl 2(), 2003
PMID: 14534192
A novel hybrid gene prediction method employing protein multiple sequence alignments.
Keller O, Kollmar M, Stanke M, Waack S., Bioinformatics 27(6), 2011
PMID: 21216780
ParsEval: parallel comparison and analysis of gene structure annotations.
Standage DS, Brendel VP., BMC Bioinformatics 13(), 2012
PMID: 22852583
Transcriptome analyses show changes in gene expression to accompany pollen germination and tube growth in Arabidopsis.
Wang Y, Zhang WZ, Song LF, Zou JJ, Su Z, Wu WH., Plant Physiol. 148(3), 2008
PMID: 18775970
Arabidopsis FORGETTER1 mediates stress-induced chromatin memory through nucleosome remodeling.
Brzezinka K, Altmann S, Czesnick H, Nicolas P, Gorka M, Benke E, Kabelitz T, Jahne F, Graf A, Kappel C, Baurle I., Elife 5(), 2016
PMID: 27680998
Global analysis of Arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection.
Ascencio-Ibanez JT, Sozzani R, Lee TJ, Chu TM, Wolfinger RD, Cella R, Hanley-Bowdoin L., Plant Physiol. 148(1), 2008
PMID: 18650403
cpSecA, a thylakoid protein translocase subunit, is essential for photosynthetic development in Arabidopsis.
Liu D, Gong Q, Ma Y, Li P, Li J, Yang S, Yuan L, Yu Y, Pan D, Xu F, Wang NN., J. Exp. Bot. 61(6), 2010
PMID: 20194926
Plastids contain a second sec translocase system with essential functions.
Skalitzky CA, Martin JR, Harwood JH, Beirne JJ, Adamczyk BJ, Heck GR, Cline K, Fernandez DE., Plant Physiol. 155(1), 2010
PMID: 21051552
The proton pump interactor (Ppi) gene family of Arabidopsis thaliana: expression pattern of Ppi1 and characterisation of knockout mutants for Ppi1 and 2.
Anzi C, Pelucchi P, Vazzola V, Murgia I, Gomarasca S, Piccoli MB, Morandini P., Plant Biol (Stuttg) 10(2), 2008
PMID: 18304198
Endogenous Arabidopsis messenger RNAs transported to distant tissues.
Thieme CJ, Rojas-Triana M, Stecyk E, Schudoma C, Zhang W, Yang L, Minambres M, Walther D, Schulze WX, Paz-Ares J, Scheible WR, Kragler F., Nat Plants 1(4), 2015
PMID: 27247031
Dissecting a hidden gene duplication: the Arabidopsis thaliana SEC10 locus.
Vukasinovic N, Cvrckova F, Elias M, Cole R, Fowler JE, Zarsky V, Synek L., PLoS ONE 9(4), 2014
PMID: 24728280
Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.
Lomsadze A, Burns PD, Borodovsky M., Nucleic Acids Res. 42(15), 2014
PMID: 24990371
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.
Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M., Bioinformatics 32(5), 2015
PMID: 26559507
The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla.
Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyere C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pe ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quetier F, Wincker P; French-Italian Public Consortium for Grapevine Genome Characterization., Nature 449(7161), 2007
PMID: 17721507
The Arabidopsis lyrata genome sequence and the basis of rapid genome size change.
Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KF, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo YL., Nat. Genet. 43(5), 2011
PMID: 21478890
The genome of the mesopolyploid crop species Brassica rapa.
Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun JH, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Pires JC, Paterson AH, Chalhoub B, Wang B, Hayward A, Sharpe AG, Park BS, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, Peng C, Geng C, Koh C, Lin C, Edwards D, Mu D, Shen D, Soumpourou E, Li F, Fraser F, Conant G, Lassalle G, King GJ, Bonnema G, Tang H, Wang H, Belcram H, Zhou H, Hirakawa H, Abe H, Guo H, Wang H, Jin H, Parkin IA, Batley J, Kim JS, Just J, Li J, Xu J, Deng J, Kim JA, Li J, Yu J, Meng J, Wang J, Min J, Poulain J, Wang J, Hatakeyama K, Wu K, Wang L, Fang L, Trick M, Links MG, Zhao M, Jin M, Ramchiary N, Drou N, Berkman PJ, Cai Q, Huang Q, Li R, Tabata S, Cheng S, Zhang S, Zhang S, Huang S, Sato S, Sun S, Kwon SJ, Choi SR, Lee TH, Fan W, Zhao X, Tan X, Xu X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Wang Z, Xiong Z, Zhang Z; Brassica rapa Genome Sequencing Project Consortium, Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun JH, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Chris Pires J, Paterson AH, Chalhoub B, Wang B, Hayward A, Sharpe AG, Park BS, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, Peng C, Geng C, Koh C, Lin C, Edwards D, Mu D, Shen D, Soumpourou E, Li F, Fraser F, Conant G, Lassalle G, King GJ, Bonnema G, Tang H, Wang H, Belcram H, Zhou H, Hirakawa H, Abe H, Guo H, Wang H, Jin H, Parkin IA, Batley J, Kim JS, Just J, Li J, Xu J, Deng J, Kim JA, Li J, Yu J, Meng J, Wang J, Min J, Poulain J, Wang J, Hatakeyama K, Wu K, Wang L, Fang L, Trick M, Links MG, Zhao M, Jin M, Ramchiary N, Drou N, Berkman PJ, Cai Q, Huang Q, Li R, Tabata S, Cheng S, Zhang S, Zhang S, Huang S, Sato S, Sun S, Kwon SJ, Choi SR, Lee TH, Fan W, Zhao X, Tan X, Xu X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Wang Z, Xiong Z, Zhang Z., Nat. Genet. 43(10), 2011
PMID: 21873998
The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes
Liu S, Liu Y, Yang X, Tong C, Edwards D, Parkin IA, Zhao M, Ma J, Yu J, Huang S., 2013
The genome of the recently domesticated crop plant sugar beet (Beta vulgaris).
Dohm JC, Minoche AE, Holtgrawe D, Capella-Gutierrez S, Zakrzewski F, Tafer H, Rupp O, Sorensen TR, Stracke R, Reinhardt R, Goesmann A, Kraft T, Schulz B, Stadler PF, Schmidt T, Gabaldon T, Lehrach H, Weisshaar B, Himmelbauer H., Nature 505(7484), 2013
PMID: 24352233
Using intron position conservation for homology-based gene prediction.
Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F., Nucleic Acids Res. 44(9), 2016
PMID: 26893356
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Quellen

PMID: 29202864
PubMed | Europe PMC

Suchen in

Google Scholar