A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set

Pucker B, Holtgräwe D, Stadermann KB, Frey K, Huettel B, Reinhardt R, Weisshaar B (2019)
PLOS ONE 14(5): e0216233.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
OA 1.21 MB
Abstract / Bemerkung
In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences. Detailed analyses of AthNd-1_v2c allowed reliable identification of large genomic rearrangements between A. thaliana accessions contributing to differences in the gene sets that distinguish the genotypes. One of the differences detected identified a gene that is lacking from the Col-0 gold standard sequence. This de novo assembly extends the known proportion of the A. thaliana pan-genome.
Stichworte
sequence assembly tools; Arabidopsis thaliana; gene prediction; plant genomics; centromeres; sequence analysis; chromosomal inversion; comparative genomics
Erscheinungsjahr
2019
Zeitschriftentitel
PLOS ONE
Band
14
Ausgabe
5
Art.-Nr.
e0216233
ISSN
1932-6203
eISSN
1932-6203
Page URI
https://pub.uni-bielefeld.de/record/2935698

Zitieren

Pucker B, Holtgräwe D, Stadermann KB, et al. A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLOS ONE. 2019;14(5): e0216233.
Pucker, B., Holtgräwe, D., Stadermann, K. B., Frey, K., Huettel, B., Reinhardt, R., & Weisshaar, B. (2019). A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLOS ONE, 14(5), e0216233. doi:10.1371/journal.pone.0216233
Pucker, B., Holtgräwe, D., Stadermann, K. B., Frey, K., Huettel, B., Reinhardt, R., and Weisshaar, B. (2019). A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLOS ONE 14:e0216233.
Pucker, B., et al., 2019. A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLOS ONE, 14(5): e0216233.
B. Pucker, et al., “A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set”, PLOS ONE, vol. 14, 2019, : e0216233.
Pucker, B., Holtgräwe, D., Stadermann, K.B., Frey, K., Huettel, B., Reinhardt, R., Weisshaar, B.: A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLOS ONE. 14, : e0216233 (2019).
Pucker, Boas, Holtgräwe, Daniela, Stadermann, Kai Bernd, Frey, Katharina, Huettel, Bruno, Reinhardt, Richard, and Weisshaar, Bernd. “A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set”. PLOS ONE 14.5 (2019): e0216233.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-05-22T03:44:47Z
MD5 Prüfsumme
9c65b6bb10a9981cf6e11e2223f96381

Link(s) zu Volltext(en)
Access Level
OA Open Access

85 References

Daten bereitgestellt von Europe PubMed Central.

The development of Arabidopsis as a model plant.
Koornneef M, Meinke D., Plant J. 61(6), 2010
PMID: 20409266
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
Arabidopsis Genome Initiative., Nature 408(6814), 2000
PMID: 11130711
The size and sequence organization of the centromeric region of arabidopsis thaliana chromosome 5.
Kumekawa N, Hosouchi T, Tsuruoka H, Kotani H., DNA Res. 7(6), 2000
PMID: 11214966
Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden.
Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, Zhang Q, Vilhjalmsson BJ, Korte A, Nizhynska V, Voronin V, Korte P, Sedman L, Mandakova T, Lysak MA, Seren U, Hellmann I, Nordborg M., Nat. Genet. 45(8), 2013
PMID: 23793030
Dissecting a hidden gene duplication: the Arabidopsis thaliana SEC10 locus.
Vukasinovic N, Cvrckova F, Elias M, Cole R, Fowler JE, Zarsky V, Synek L., PLoS ONE 9(4), 2014
PMID: 24728280
Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions.
Kawakatsu T, Huang SC, Jupe F, Sasaki E, Schmitz RJ, Urich MA, Castanon R, Nery JR, Barragan C, He Y, Chen H, Dubin M, Lee CR, Wang C, Bemm F, Becker C, O'Neil R, O'Malley RC, Quarless DX; 1001 Genomes Consortium, Schork NJ, Weigel D, Nordborg M, Ecker JR, Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt K, Chae E, Dezwaan T, Ding W, Ecker JR, Exposito-Alonso M, Farlow A, Fitz J, Gan X, Grimm DG, Hancock A, Henz SR, Holm S, Horton M, Jarsulic M, Kerstetter RA, Korte A, Korte P, Lanz C, Lee CR, Meng D, Michael TP, Mott R, Muliyati NW, Nagele T, Nagler M, Nizhynska V, Nordborg M, Novikova P, Pico FX, Platzer A, Rabanal FA, Rodriguez A, Rowan BA, Salome PA, Schmid K, Schmitz RJ, Seren U, Sperone FG, Sudkamp M, Svardal H, Tanzer MM, Todd D, Volchenboum SL, Wang C, Wang G, Wang X, Weckwerth W, Weigel D, Zhou X., Cell 166(2), 2016
PMID: 27419873
Reference-guided assembly of four diverse Arabidopsis thaliana genomes
AUTHOR UNKNOWN, 2011
De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits.
Li YH, Zhou G, Ma J, Jiang W, Jin LG, Zhang Z, Guo Y, Zhang J, Sui Y, Zheng L, Zhang SS, Zuo Q, Shi XH, Li YF, Zhang WK, Hu Y, Kong G, Hong HL, Tan B, Song J, Liu ZX, Wang Y, Ruan H, Yeung CK, Liu J, Wang H, Zhang LJ, Guan RX, Wang KJ, Li WB, Chen SY, Chang RZ, Jiang Z, Jackson SA, Li R, Qiu LJ., Nat. Biotechnol. 32(10), 2014
PMID: 25218520
1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana.
1001 Genomes Consortium. Electronic address: magnus.nordborg@gmi.oeaw.ac.at; 1001 Genomes Consortium, Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, Ecker JR, Exposito-Alonso M, Farlow A, Fitz J, Gan X, Grimm DG, Hancock AM, Henz SR, Holm S, Horton M, Jarsulic M, Kerstetter RA, Korte A, Korte P, Lanz C, Lee CR, Meng D, Michael TP, Mott R, Muliyati NW, Nagele T, Nagler M, Nizhynska V, Nordborg M, Novikova PY, Pico FX, Platzer A, Rabanal FA, Rodriguez A, Rowan BA, Salome PA, Schmid KJ, Schmitz RJ, Seren U, Sperone FG, Sudkamp M, Svardal H, Tanzer MM, Todd D, Volchenboum SL, Wang C, Wang G, Wang X, Weckwerth W, Weigel D, Zhou X., Cell 166(2), 2016
PMID: 27293186
Long-read, whole-genome shotgun sequence data for five model organisms.
Kim KE, Peluso P, Babayan P, Yeadon PJ, Yu C, Fisher WW, Chin CS, Rapicavoli NA, Rank DR, Li J, Catcheside DE, Celniker SE, Phillippy AM, Bergman CM, Landolin JM., Sci Data 1(), 2014
PMID: 25977796
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM., Nat. Biotechnol. 33(6), 2015
PMID: 26006009
Phased diploid genome assembly with single-molecule real-time sequencing.
Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O'Malley R, Figueroa-Balderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC., Nat. Methods 13(12), 2016
PMID: 27749838
A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny.
Pucker B, Holtgrawe D, Rosleff Sorensen T, Stracke R, Viehover P, Weisshaar B., PLoS ONE 11(10), 2016
PMID: 27711162
Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms
AUTHOR UNKNOWN, 2016
High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell.
Michael TP, Jupe F, Bemm F, Motley ST, Sandoval JP, Lanz C, Loudet O, Weigel D, Ecker JR., Nat Commun 9(1), 2018
PMID: 29416032
Discovery and genotyping of structural variation from long-read haploid genome sequence data.
Huddleston J, Chaisson MJP, Steinberg KM, Warren W, Hoekzema K, Gordon D, Graves-Lindsay TA, Munson KM, Kronenberg ZN, Vives L, Peluso P, Boitano M, Chin CS, Korlach J, Wilson RK, Eichler EE., Genome Res. 27(5), 2016
PMID: 27895111
The Theory and Practice of Genome Sequence Assembly.
Simpson JT, Pop M., Annu Rev Genomics Hum Genet 16(), 2015
PMID: 25939056
PacBio Sequencing and Its Applications.
Rhoads A, Au KF., Genomics Proteomics Bioinformatics 13(5), 2015
PMID: 26542840
Near-optimal assembly for shotgun sequencing with noisy reads.
Lam KK, Khalak A, Tse D., BMC Bioinformatics 15 Suppl 9(), 2014
PMID: 25252708

AUTHOR UNKNOWN, 0
Reducing assembly complexity of microbial genomes with single-molecule sequencing.
Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, Mcvey SD, Radune D, Bergman NH, Phillippy AM., Genome Biol. 14(9), 2013
PMID: 24034426
Assembly and diploid architecture of an individual human genome via single-molecule technologies.
Pendleton M, Sebra R, Pang AW, Ummat A, Franzen O, Rausch T, Stutz AM, Stedman W, Anantharaman T, Hastie A, Dai H, Fritz MH, Cao H, Cohain A, Deikus G, Durrett RE, Blanchard SC, Altman R, Chin CS, Guo Y, Paxinos EE, Korbel JO, Darnell RB, McCombie WR, Kwok PY, Mason CE, Schadt EE, Bashir A., Nat. Methods 12(8), 2015
PMID: 26121404
BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files
AUTHOR UNKNOWN, 2018
Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM., Genome Res. 27(5), 2017
PMID: 28298431
Scaffolding pre-assembled contigs using SSPACE.
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W., Bioinformatics 27(4), 2010
PMID: 21149342
The Structural Features of Thousands of T-DNA Insertion Sites Are Consistent with a Double-Strand Break Repair-Based Insertion Mechanism.
Kleinboelting N, Huep G, Appelhagen I, Viehoever P, Li Y, Weisshaar B., Mol Plant 8(11), 2015
PMID: 26343971
Circular genome visualization and exploration using CGView.
Stothard P, Wishart DS., Bioinformatics 21(4), 2004
PMID: 15479716

AUTHOR UNKNOWN, 2018

AUTHOR UNKNOWN, 2013
Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM., PLoS ONE 9(11), 2014
PMID: 25409509
Primer3Plus, an enhanced web interface to Primer3.
Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JA., Nucleic Acids Res. 35(Web Server issue), 2007
PMID: 17485472
An Arabidopsis thaliana T-DNA mutagenized population (GABI-Kat) for flanking sequence tag-based reverse genetics.
Rosso MG, Li Y, Strizhov N, Reiss B, Dekker K, Weisshaar B., Plant Mol. Biol. 53(1-2), 2003
PMID: 14756321
Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.
Cheng CY, Krishnakumar V, Chan AP, Thibaud-Nissen F, Schobel S, Town CD., Plant J. 89(4), 2017
PMID: 27862469
PGP repository: a plant phenomics and genomics data publication infrastructure
AUTHOR UNKNOWN, 2016
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM., Bioinformatics 31(19), 2015
PMID: 26059717
Versatile and open software for comparing large genomes.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL., Genome Biol. 5(2), 2004
PMID: 14759262
NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences.
Khelik K, Lagesen K, Sandve GK, Rognes T, Nederbragt AJ., BMC Bioinformatics 18(1), 2017
PMID: 28701187
A novel hybrid gene prediction method employing protein multiple sequence alignments.
Keller O, Kollmar M, Stanke M, Waack S., Bioinformatics 27(6), 2011
PMID: 21216780
Large-scale identification and analysis of genome-wide single-nucleotide polymorphisms for mapping in Arabidopsis thaliana.
Schmid KJ, Sorensen TR, Stracke R, Torjek O, Altmann T, Mitchell-Olds T, Weisshaar B., Genome Res. 13(6A), 2003
PMID: 12799357
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712

AUTHOR UNKNOWN, 0
Cactus: Algorithms for genome multiple sequence alignment.
Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D., Genome Res. 21(9), 2011
PMID: 21665927
Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation.
Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE, Haussler D, Stanke M, Paten B., Genome Res. 28(7), 2018
PMID: 29884752
Infernal 1.1: 100-fold faster RNA homology searches.
Nawrocki EP, Eddy SR., Bioinformatics 29(22), 2013
PMID: 24008419
Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.
Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, Eddy SR, Bateman A, Finn RD, Petrov AI., Nucleic Acids Res. 46(D1), 2018
PMID: 29112718
Genome organization of more than 300 defensin-like genes in Arabidopsis.
Silverstein KA, Graham MA, Paape TD, VandenBosch KA., Plant Physiol. 138(2), 2005
PMID: 15955924
An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.
Cho YS, Kim H, Kim HM, Jho S, Jun J, Lee YJ, Chae KS, Kim CG, Kim S, Eriksson A, Edwards JS, Lee S, Kim BC, Manica A, Oh TK, Church GM, Bhak J., Nat Commun 7(), 2016
PMID: 27882922
De novo assembly and phasing of a Korean human genome.
Seo JS, Rhie A, Kim J, Lee S, Sohn MH, Kim CU, Hastie A, Cao H, Yun JY, Kim J, Kuk J, Park GH, Kim J, Ryu H, Kim J, Roh M, Baek J, Hunkapiller MW, Korlach J, Shin JY, Kim C., Nature 538(7624), 2016
PMID: 27706134
Chromosome-specific NOR inactivation explains selective rRNA gene silencing and dosage control in Arabidopsis.
Chandrasekhara C, Mohannath G, Blevins T, Pontvianne F, Pikaard CS., Genes Dev. 30(2), 2016
PMID: 26744421
The Arabidopsis lyrata genome sequence and the basis of rapid genome size change.
Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KF, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo YL., Nat. Genet. 43(5), 2011
PMID: 21478890
Centromeres were derived from telomeres during the evolution of the eukaryotic chromosome
AUTHOR UNKNOWN, 2007
Complex mtDNA constitutes an approximate 620-kb insertion on Arabidopsis thaliana chromosome 2: implication of potential sequencing errors caused by large-unit repeats
AUTHOR UNKNOWN, 2001
Integrated cytogenetic map of chromosome arm 4S of A. thaliana: structural organization of heterochromatic knob and centromere region.
Fransz PF, Armstrong S, de Jong JH, Parnell LD, van Drunen C, Dean C, Zabel P, Bisseling T, Jones GH., Cell 100(3), 2000
PMID: 10676818
The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides.
Unseld M, Marienfeld JR, Brandt P, Brennicke A., Nat. Genet. 15(1), 1997
PMID: 8988169
Mutations at the Arabidopsis CHM locus promote rearrangements of the mitochondrial genome.
Martinez-Zapater JM, Gil P, Capel J, Somerville CR., Plant Cell 4(8), 1992
PMID: 1356535
Fewer genes than organelles: extremely low and variable gene copy numbers in mitochondria of somatic plant cells.
Preuten T, Cincu E, Fuchs J, Zoschke R, Liere K, Borner T., Plant J. 64(6), 2010
PMID: 21143676
Evolution of plant genome architecture
AUTHOR UNKNOWN, 2016
Extensive error in the number of genes inferred from draft genome assemblies.
Denton JF, Lugo-Martinez J, Tucker AE, Schrider DR, Warren WC, Hahn MW., PLoS Comput. Biol. 10(12), 2014
PMID: 25474019
Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps
AUTHOR UNKNOWN, 2018
Evolution of Gene Duplication in Plants.
Panchy N, Lehti-Shiu M, Shiu SH., Plant Physiol. 171(4), 2016
PMID: 27288366
Variation of presence/absence genes among Arabidopsis populations
AUTHOR UNKNOWN, 2012
Evolutionary and expression signatures of pseudogenes in Arabidopsis and rice.
Zou C, Lehti-Shiu MD, Thibaud-Nissen F, Prakash T, Buell CR, Shiu SH., Plant Physiol. 151(1), 2009
PMID: 19641029
Empirical analysis of transcriptional activity in the Arabidopsis genome.
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, Pham P, Cheuk R, Karlin-Newmann G, Liu SX, Lam B, Sakano H, Wu T, Yu G, Miranda M, Quach HL, Tripp M, Chang CH, Lee JM, Toriumi M, Chan MM, Tang CC, Onodera CS, Deng JM, Akiyama K, Ansari Y, Arakawa T, Banh J, Banno F, Bowser L, Brooks S, Carninci P, Chao Q, Choy N, Enju A, Goldsmith AD, Gurjal M, Hansen NF, Hayashizaki Y, Johnson-Hopson C, Hsuan VW, Iida K, Karnes M, Khan S, Koesema E, Ishida J, Jiang PX, Jones T, Kawai J, Kamiya A, Meyers C, Nakajima M, Narusaka M, Seki M, Sakurai T, Satou M, Tamse R, Vaysberg M, Wallender EK, Wong C, Yamamura Y, Yuan S, Shinozaki K, Davis RW, Theologis A, Ecker JR., Science 302(5646), 2003
PMID: 14593172
An apomixis-linked ORC3-like pseudogene is associated with silencing of its functional homolog in apomictic Paspalum simplex.
Siena LA, Ortiz JP, Calderini O, Paolocci F, Caceres ME, Kaushal P, Grisan S, Pessino SC, Pupilli F., J. Exp. Bot. 67(6), 2016
PMID: 26842983
Arabidopsis thaliana population analysis reveals high plasticity of the genomic region spanning MSH2, AT3G18530 and AT3G18535 genes and provides evidence for NAHR-driven recurrent CNV events occurring in this location
AUTHOR UNKNOWN, 2016
Structural variation and genome complexity: is dispensable really dispensable?
Marroni F, Pinosio S, Morgante M., Curr. Opin. Plant Biol. 18(), 2014
PMID: 24548794
Towards plant pangenomics.
Golicz AA, Batley J, Edwards D., Plant Biotechnol. J. 14(4), 2015
PMID: 26593040
Nuclear DNA Content of Some Important Plant Species
AUTHOR UNKNOWN, 1991
An inventory of 1152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana.
Hofte H, Desprez T, Amselem J, Chiapello H, Rouze P, Caboche M, Moisan A, Jourjon MF, Charpenteau JL, Berthomieu P., Plant J. 4(6), 1993
PMID: 8281187
Material in PUB:
Dissertation, die diesen PUB Eintrag enthält

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 31112551
PubMed | Europe PMC

bioRxiv: 10.1101/407627

Suchen in

Google Scholar