Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes

Pucker B, Brockington SF (2018)
BMC Genomics 19(1): 980.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
OA 1.11 MB
Pucker, BoasUniBi ; Brockington, Samuel F.
Abstract / Bemerkung
Background: Most eukaryotic genes comprise exons and introns thus requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis. U2 and U12 spliceosomes catalyze this step by recognizing motifs on the transcript in order to remove the introns. A process which is dependent on precise definition of exon-intron borders by splice sites, which are consequently highly conserved across species. Only very few combinations of terminal dinucleotides are frequently observed at intron ends, dominated by the canonical GT-AG splice sites on the DNA level. Results: Here we investigate the occurrence of diverse combinations of dinucleotides at predicted splice sites. Analyzing 121 plant genome sequences based on their an notation revealed strong splice site conservation across species, annotation errors, and true biological divergence from canonical splice sites. The frequency of non-canonical splice sites clearly correlates with their divergence from canonical ones indicating either an accumulation of probably neutral mutations, or evolution towards canonical splice sites. Strong conservation across multiple species and non-random accumulation of substitutions in splice sites indicate a functional relevance of non-canonical splice sites. The average composition of splice sites across all investigated species is 98.7% for GT-AG, 1.2% for GC-AG, 0.06% for AT-AC, and 0.09% for minor non-canonical splice sites. RNA-Seq data sets of 35 species were incorporated to validate non-canonical splice site predictions through gaps in sequencing reads alignments and to demonstrate the expression of affected genes. Conclusion: We conclude that bona fide non-canonical splice sites are present and appear to be functionally relevantinmostplantgenomes, although at low abundance.
BMC Genomics
Open-Access-Publikationskosten wurden durch die Deutsche Forschungsgemeinschaft und die Universität Bielefeld gefördert.
Page URI


Pucker B, Brockington SF. Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. BMC Genomics. 2018;19(1): 980.
Pucker, B., & Brockington, S. F. (2018). Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. BMC Genomics, 19(1), 980. doi:10.1186/s12864-018-5360-z
Pucker, Boas, and Brockington, Samuel F. 2018. “Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes”. BMC Genomics 19 (1): 980.
Pucker, B., and Brockington, S. F. (2018). Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. BMC Genomics 19:980.
Pucker, B., & Brockington, S.F., 2018. Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. BMC Genomics, 19(1): 980.
B. Pucker and S.F. Brockington, “Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes”, BMC Genomics, vol. 19, 2018, : 980.
Pucker, B., Brockington, S.F.: Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. BMC Genomics. 19, : 980 (2018).
Pucker, Boas, and Brockington, Samuel F. “Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes”. BMC Genomics 19.1 (2018): 980.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Access Level
OA Open Access
Zuletzt Hochgeladen
MD5 Prüfsumme

Link(s) zu Volltext(en)
Access Level
OA Open Access

1 Zitation in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set.
Pucker B, Holtgräwe D, Stadermann KB, Frey K, Huettel B, Reinhardt R, Weisshaar B., PLoS One 14(5), 2019
PMID: 31112551

77 References

Daten bereitgestellt von Europe PubMed Central.

Spliced segments at the 5' terminus of adenovirus 2 late mRNA.
Berget SM, Moore C, Sharp PA., Proc. Natl. Acad. Sci. U.S.A. 74(8), 1977
PMID: 269380
The exon theory of genes.
Gilbert W., Cold Spring Harb. Symp. Quant. Biol. 52(), 1987
PMID: 2456887
The ancient Virus World and evolution of cells.
Koonin EV, Senkevich TG, Dolja VV., Biol. Direct 1(), 2006
PMID: 16984643

Introns: The Functional Benefits of Introns in Genomes.
Jo BS, Choi SS., Genomics Inform 13(4), 2015
PMID: 26865841
The role of introns in the conservation of the metabolic genes of Arabidopsis thaliana.
Mukherjee D, Saha D, Acharya D, Mukherjee A, Chakraborty S, Ghosh TC., Genomics 110(5), 2017
PMID: 29247768
Origin and evolution of spliceosomal introns.
Rogozin IB, Carmel L, Csuros M, Koonin EV., Biol. Direct 7(), 2012
PMID: 22507701

Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas.
Worden AZ, Lee JH, Mock T, Rouze P, Simmons MP, Aerts AL, Allen AE, Cuvelier ML, Derelle E, Everett MV, Foulon E, Grimwood J, Gundlach H, Henrissat B, Napoli C, McDonald SM, Parker MS, Rombauts S, Salamov A, Von Dassow P, Badger JH, Coutinho PM, Demir E, Dubchak I, Gentemann C, Eikrem W, Gready JE, John U, Lanier W, Lindquist EA, Lucas S, Mayer KF, Moreau H, Not F, Otillar R, Panaud O, Pangilinan J, Paulsen I, Piegu B, Poliakov A, Robbens S, Schmutz J, Toulza E, Wyss T, Zelensky A, Zhou K, Armbrust EV, Bhattacharya D, Goodenough UW, Van de Peer Y, Grigoriev IV., Science 324(5924), 2009
PMID: 19359590
Mechanism for DNA transposons to generate introns on genomic scales.
Huff JT, Zilberman D, Roy SW., Nature 538(7626), 2016
PMID: 27760113

Promiscuous mitochondrial group II intron sequences in plant nuclear genomes.
Knoop V, Brennicke A., J. Mol. Evol. 39(2), 1994
PMID: 7932778

The spliceosome: design principles of a dynamic RNP machine.
Wahl MC, Will CL, Luhrmann R., Cell 136(4), 2009
PMID: 19239890
The Spliceosome: The Ultimate RNA Chaperone and Sculptor.
Papasaikas P, Valcarcel J., Trends Biochem. Sci. 41(1), 2015
PMID: 26682498
The significant other: splicing by the minor spliceosome.
Turunen JJ, Niemela EH, Verma B, Frilander MJ., Wiley Interdiscip Rev RNA 4(1), 2012
PMID: 23074130
Splicing of a divergent subclass of AT-AC introns requires the major spliceosomal snRNAs
Wu Q, Krainer AR., 1997
Terminal intron dinucleotide sequences do not distinguish between U2- and U12-dependent introns.
Dietrich RC, Incorvaia R, Padgett RA., Mol. Cell 1(1), 1997
PMID: 9659912
Determinants of plant U12-dependent intron splicing efficiency.
Lewandowska D, Simpson CG, Clark GP, Jennings NS, Barciszewska-Pacak M, Lin CF, Makalowski W, Brown JW, Jarmolowski A., Plant Cell 16(5), 2004
PMID: 15100401
Splicing in disease: disruption of the splicing code and the decoding machinery.
Wang GS, Cooper TA., Nat. Rev. Genet. 8(10), 2007
PMID: 17726481
Spliceosome structure and function.
Will CL, Luhrmann R., Cold Spring Harb Perspect Biol 3(7), 2011
PMID: 21441581
Lessons from non-canonical splicing.
Sibley CR, Blazquez L, Ule J., Nat. Rev. Genet. 17(7), 2016
PMID: 27240813
Genes with a large intronic burden show greater evolutionary conservation on the protein level.
Gorlova O, Fedorov A, Logothetis C, Amos C, Gorlov I., BMC Evol. Biol. 14(1), 2014
PMID: 24629165
Recursive splicing in long vertebrate genes.
Sibley CR, Emmett W, Blazquez L, Faro A, Haberman N, Briese M, Trabzuni D, Ryten M, Weale ME, Hardy J, Modic M, Curk T, Wilson SW, Plagnol V, Ule J., Nature 521(7552), 2015
PMID: 25970246
Genome-wide identification and characterization of circular RNAs by high throughput sequencing in soybean.
Zhao W, Cheng Y, Zhang C, You Q, Shen X, Guo W, Jiao Y., Sci Rep 7(1), 2017
PMID: 28717203
A reappraisal of non-consensus mRNA splice sites.
Jackson IJ., Nucleic Acids Res. 19(14), 1991
PMID: 1713664

STAR: ultrafast universal RNA-seq aligner.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR., Bioinformatics 29(1), 2012
PMID: 23104886
Mapping RNA-seq Reads with STAR.
Dobin A, Gingeras TR., Curr Protoc Bioinformatics 51(), 2015
PMID: 26334920
Analysis of canonical and non-canonical splice sites in mammalian genomes.
Burset M, Seledtsov IA, Solovyev VV., Nucleic Acids Res. 28(21), 2000
PMID: 11058137
Comprehensive splice-site analysis using comparative genomics.
Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R., Nucleic Acids Res. 34(14), 2006
PMID: 16914448

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM., Bioinformatics 31(19), 2015
PMID: 26059717
Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.
Cheng CY, Krishnakumar V, Chan AP, Thibaud-Nissen F, Schobel S, Town CD., Plant J. 89(4), 2017
PMID: 27862469

WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes.
Hoff KJ, Stanke M., Nucleic Acids Res. 41(Web Server issue), 2013
PMID: 23700307
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Matplotlib: a 2D graphics environment
Hunter JD., 2007
The sequence read archive.
Leinonen R, Sugawara H, Shumway M; International Nucleotide Sequence Database Collaboration., Nucleic Acids Res. 39(Database issue), 2010
PMID: 21062823

BEDTools: a flexible suite of utilities for comparing genomic features.
Quinlan AR, Hall IM., Bioinformatics 26(6), 2010
PMID: 20110278

Plant core environmental stress response genes are systemically coordinated during abiotic stresses.
Hahn A, Kilian J, Mohrholz A, Ladwig F, Peschke F, Dautel R, Harter K, Berendzen KW, Wanke D., Int J Mol Sci 14(4), 2013
PMID: 23567274
A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny.
Pucker B, Holtgrawe D, Rosleff Sorensen T, Stracke R, Viehover P, Weisshaar B., PLoS ONE 11(10), 2016
PMID: 27711162

Genome-wide identification and characterisation of R2R3-MYB genes in sugar beet (Beta vulgaris).
Stracke R, Holtgrawe D, Schneider J, Pucker B, Sorensen TR, Weisshaar B., BMC Plant Biol. 14(), 2014
PMID: 25249410
Identification of human short introns.
Abebrese EL, Ali SH, Arnold ZR, Andrews VM, Armstrong K, Burns L, Crowder HR, Day RT Jr, Hsu DG, Jarrell K, Lee G, Luo Y, Mugayo D, Raza Z, Friend K., PLoS ONE 12(5), 2017
PMID: 28520720
Two novel arginine/serine (SR) proteins in maize are differentially spliced and utilize non-canonical splice sites.
Gupta S, Wang BB, Stryker GA, Zanetti ME, Lal SK., Biochim. Biophys. Acta 1728(3), 2005
PMID: 15780972
Features of Arabidopsis genes and genome discovered using full-length cDNAs.
Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA., Plant Mol. Biol. 60(1), 2006
PMID: 16463100
Comparative cross-species alternative splicing in plants.
Ner-Gaon H, Leviatan N, Rubin E, Fluhr R., Plant Physiol. 144(3), 2007
PMID: 17496110
Comparative analysis of serine/arginine-rich proteins across 27 eukaryotes: insights into sub-family classification and extent of alternative splicing.
Richardson DN, Rogers MF, Labadorf A, Ben-Hur A, Guo H, Paterson AH, Reddy AS., PLoS ONE 6(9), 2011
PMID: 21935421
Pre-mRNA splicing repression triggers abiotic stress signaling in plants.
Ling Y, Alshareef S, Butt H, Lozano-Juste J, Li L, Galal AA, Moustafa A, Momin AA, Tashkandi M, Richardson DN, Fujii H, Arold S, Rodriguez PL, Duque P, Mahfouz MM., Plant J. 89(2), 2017
PMID: 27664942
Automated generation of heuristics for biological sequence comparison.
Slater GS, Birney E., BMC Bioinformatics 6(), 2005
PMID: 15713233
A comprehensive survey of non-canonical splice sites in the human transcriptome.
Parada GE, Munita R, Cerda CA, Gysling K., Nucleic Acids Res. 42(16), 2014
PMID: 25123659
Mechanism of non-spliceosomal mRNA splicing in the unfolded protein response pathway.
Gonzalez TN, Sidrauski C, Dorfler S, Walter P., EMBO J. 18(11), 1999
PMID: 10357823
Looking ultra deep: short identical sequences and transcriptional slippage.
Ritz K, van Schaik BD, Jakobs ME, Aronica E, Tijssen MA, van Kampen AH, Baas F., Genomics 98(2), 2011
PMID: 21624457

Uridylate-rich small nuclear RNAs (UsnRNAs), their genes and pseudogenes, and UsnRNPs in plants: structure and function. A comparative approach
Solymosy F, Pollák T., 1993
Intron RNA editing is essential for splicing in plant mitochondria.
Castandet B, Choury D, Begu D, Jordana X, Araya A., Nucleic Acids Res. 38(20), 2010
PMID: 20615898
Enhanced splicing of nonconsensus 3' splice sites late during adenovirus infection.
Muhlemann O, Kreivi JP, Akusjarvi G., J. Virol. 69(11), 1995
PMID: 7474163
Mechanistic insights into human pre-mRNA splicing of human ultra-short introns: potential unusual mechanism identifies G-rich introns.
Sasaki-Haraguchi N, Shimada MK, Taniguchi I, Ohno M, Mayeda A., Biochem. Biophys. Res. Commun. 423(2), 2012
PMID: 22640740
Identification of minimal eukaryotic introns through GeneBase, a user-friendly tool for parsing the NCBI gene databank
Piovesan A, Caracausi M, Ricci M, Strippoli P, Vitale L, Pelleri MC., 2015
Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.
Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA, Chen H, Frazer KA, Huson DH, Scholkopf B, Nordborg M, Ratsch G, Ecker JR, Weigel D., Science 317(5836), 2007
PMID: 17641193
1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana.
1001 Genomes Consortium. Electronic address:; 1001 Genomes Consortium, Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, Ecker JR, Exposito-Alonso M, Farlow A, Fitz J, Gan X, Grimm DG, Hancock AM, Henz SR, Holm S, Horton M, Jarsulic M, Kerstetter RA, Korte A, Korte P, Lanz C, Lee CR, Meng D, Michael TP, Mott R, Muliyati NW, Nagele T, Nagler M, Nizhynska V, Nordborg M, Novikova PY, Pico FX, Platzer A, Rabanal FA, Rodriguez A, Rowan BA, Salome PA, Schmid KJ, Schmitz RJ, Seren U, Sperone FG, Sudkamp M, Svardal H, Tanzer MM, Todd D, Volchenboum SL, Wang C, Wang G, Wang X, Weckwerth W, Weigel D, Zhou X., Cell 166(2), 2016
PMID: 27293186
The real cost of sequencing: scaling computation to keep pace with data generation.
Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, Zhang J, Weinstock GM, Isaacs F, Rozowsky J, Gerstein M., Genome Biol. 17(), 2016
PMID: 27009100



Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 30594132
PubMed | Europe PMC

Preprint: 10.1101/428318

Suchen in

Google Scholar