GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers

Jünemann, Sebastian; Prior, Karola; Albersmeier, Andreas; Albaum, Stefan; Kalinowski, Jörn; Goesmann, Alexander; Stoye, Jens; Harmsen, Dag

GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers

Jünemann S, Prior K, Albersmeier A, Albaum S, Kalinowski J, Goesmann A, Stoye J, Harmsen D (2014)
PLOS ONE 9(9): e107014.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.1371/journal.pone.0107014

Autor*in

Jünemann, Sebastian^UniBi; Prior, Karola; Albersmeier, Andreas^UniBi; Albaum, Stefan^UniBi ; Kalinowski, Jörn^UniBi; Goesmann, Alexander; Stoye, Jens^UniBi ; Harmsen, Dag

Einrichtung

Centrum für Biotechnologie > Technologieplattformen > Bioinformatics Resource Facility
Technische Fakultät > AG Genominformatik
Centrum für Biotechnologie > Arbeitsgruppe J. Kalinowski
Centrum für Biotechnologie > Arbeitsgruppe J. Stoye
Centrum für Biotechnologie > Institut für Bioinformatik

Abstract / Bemerkung

De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.

Erscheinungsjahr

2014

Zeitschriftentitel

PLOS ONE

Band

Ausgabe

Art.-Nr.

e107014

ISSN

1932-6203

eISSN

1932-6203

Page URI

https://pub.uni-bielefeld.de/record/2689773

Zitieren

Jünemann S, Prior K, Albersmeier A, et al. GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE. 2014;9(9): e107014.

Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., et al. (2014). GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE, 9(9), e107014. doi:10.1371/journal.pone.0107014

Jünemann, Sebastian, Prior, Karola, Albersmeier, Andreas, Albaum, Stefan, Kalinowski, Jörn, Goesmann, Alexander, Stoye, Jens, and Harmsen, Dag. 2014. “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”. PLOS ONE 9 (9): e107014.

Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., and Harmsen, D. (2014). GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE 9:e107014.

Jünemann, S., et al., 2014. GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE, 9(9): e107014.

S. Jünemann, et al., “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”, PLOS ONE, vol. 9, 2014, : e107014.

Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., Harmsen, D.: GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE. 9, : e107014 (2014).

Jünemann, Sebastian, Prior, Karola, Albersmeier, Andreas, Albaum, Stefan, Kalinowski, Jörn, Goesmann, Alexander, Stoye, Jens, and Harmsen, Dag. “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”. PLOS ONE 9.9 (2014): e107014.

Daten bereitgestellt von European Bioinformatics Institute (EBI)

11 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data.
Low AJ, Koziol AG, Manninger PA, Blais B, Carrillo CD., PeerJ 7(), 2019
PMID: 31183253

High Interlaboratory Reproducibility and Accuracy of Next-Generation-Sequencing-Based Bacterial Genotyping in a Ring Trial.
Mellmann A, Andersen PS, Bletz S, Friedrich AW, Kohl TA, Lilje B, Niemann S, Prior K, Rossen JW, Harmsen D., J Clin Microbiol 55(3), 2017
PMID: 28053217

A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses.
Dimitrov KM, Sharma P, Volkening JD, Goraichuk IV, Wajid A, Rehmani SF, Basharat A, Shittu I, Joannis TM, Miller PJ, Afonso CL., Virol J 14(1), 2017
PMID: 28388925

MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.
Lugli GA, Milani C, Mancabelli L, van Sinderen D, Ventura M., FEMS Microbiol Lett 363(7), 2016
PMID: 26936607

A Primer on Infectious Disease Bacterial Genomics.
Lynch T, Petkau A, Knox N, Graham M, Van Domselaar G., Clin Microbiol Rev 29(4), 2016
PMID: 28590251

Ensuring backwards compatibility: traditional genotyping efforts in the era of whole genome sequencing.
Bletz S, Mellmann A, Rothgänger J, Harmsen D., Clin Microbiol Infect 21(4), 2015
PMID: 25658529

An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology.
Li J, Batcha AM, Grüning B, Mansmann UR., Cancer Inform 14(suppl 5), 2015
PMID: 27081306

Completing bacterial genome assemblies: strategy and performance comparisons.
Liao YC, Lin SH, Lin HH., Sci Rep 5(), 2015
PMID: 25735824

Correction: GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.
PLOS ONE Staff., PLoS One 10(3), 2015
PMID: 25789774

Investigating the mobilome in clinically important lineages of Enterococcus faecium and Enterococcus faecalis.
Mikalsen T, Pedersen T, Willems R, Coque TM, Werner G, Sadowy E, van Schaik W, Jensen LB, Sundsfjord A, Hegstad K., BMC Genomics 16(), 2015
PMID: 25885771

Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.
Lin HH, Liao YC., PLoS One 10(12), 2015
PMID: 26641475

44 References

Daten bereitgestellt von Europe PubMed Central.

Performance comparison of benchtop high-throughput sequencing platforms.
Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ., Nat. Biotechnol. 30(5), 2012
PMID: 22522955

Updating benchtop sequencing performance comparison.
Junemann S, Sedlazeck FJ, Prior K, Albersmeier A, John U, Kalinowski J, Mellmann A, Goesmann A, von Haeseler A, Stoye J, Harmsen D., Nat. Biotechnol. 31(4), 2013
PMID: 23563421

How to apply de Bruijn graphs to genome assembly.
Compeau PE, Pevzner PA, Tesler G., Nat. Biotechnol. 29(11), 2011
PMID: 22068540

Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph.
Li Z, Chen Y, Mu D, Yuan J, Shi Y, Zhang H, Gan J, Li N, Hu X, Liu B, Yang B, Fan W., Brief Funct Genomics 11(1), 2011
PMID: 22184334

Sequence assembly demystified.
Nagarajan N, Pop M., Nat. Rev. Genet. 14(3), 2013
PMID: 23358380

Sequence assembly Computational biology and chemistry
AUTHOR UNKNOWN, 2009

De novo assembly of short sequence reads.
Paszkiewicz K, Studholme DJ., Brief. Bioinformatics 11(5), 2010
PMID: 20724458

A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.
Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B., PLoS ONE 6(3), 2011
PMID: 21423806

Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results.
Haiminen N, Kuhn DN, Parida L, Rigoutsos I., PLoS ONE 6(9), 2011
PMID: 21915294

Comparative studies of de novo assembly tools for next-generation sequencing technologies.
Lin Y, Li J, Shen H, Zhang L, Papasian CJ, Deng HW., Bioinformatics 27(15), 2011
PMID: 21636596

Comparing de novo genome assembly: the long and short of it.
Narzisi G, Mishra B., PLoS ONE 6(4), 2011
PMID: 21559467

Assemblathon 1: a competitive assessment of de novo short read assembly methods.
Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol I, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B., Genome Res. 21(12), 2011
PMID: 21926179

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou WC, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis E, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam TW, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, Maccallum I, Macmanes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu SM, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF., Gigascience 2(1), 2013
PMID: 23870653

GAGE: A critical evaluation of genome assemblies and assembly algorithms.
Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marcais G, Pop M, Yorke JA., Genome Res. 22(3), 2012
PMID: 22147368

GAGE-B: an evaluation of genome assemblers for bacterial organisms.
Magoc T, Pabinger S, Canzar S, Liu X, Su Q, Puiu D, Tallon LJ, Salzberg SL., Bioinformatics 29(14), 2013
PMID: 23665771

ABySS: a parallel assembler for short read sequence data.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I., Genome Res. 19(6), 2009
PMID: 19251739

Aggressive assembly of pyrosequencing reads with mates.
Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G., Bioinformatics 24(24), 2008
PMID: 18952627

AUTHOR UNKNOWN, 0

Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S., Genome Res. 14(6), 2004
PMID: 15140833

AUTHOR UNKNOWN, 0

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J., Gigascience 1(1), 2012
PMID: 23587118

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA., J. Comput. Biol. 19(5), 2012
PMID: 22506599

Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
Zerbino DR, Birney E., Genome Res. 18(5), 2008
PMID: 18349386

High-quality draft assemblies of mammalian genomes from massively parallel sequence data.
Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB., Proc. Natl. Acad. Sci. U.S.A. 108(4), 2010
PMID: 21187386

ARACHNE: a whole-genome shotgun assembler.
Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES., Genome Res. 12(1), 2002
PMID: 11779843

The MaSuRCA genome assembler.
Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA., Bioinformatics 29(21), 2013
PMID: 23990416

Efficient de novo assembly of large genomes using compressed data structures.
Simpson JT, Durbin R., Genome Res. 22(3), 2011
PMID: 22156294

The phusion assembler.
Mullikin JC, Ning Z., Genome Res. 13(1), 2003
PMID: 12529309

De novo likelihood-based measures for comparing genome assemblies.
Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M., BMC Res Notes 6(), 2013
PMID: 23965294

QUAST: quality assessment tool for genome assemblies.
Gurevich A, Saveliev V, Vyahhi N, Tesler G., Bioinformatics 29(8), 2013
PMID: 23422339

Error and error mitigation in low-coverage genome assemblies.
Hubisz MJ, Lin MF, Kellis M, Siepel A., PLoS ONE 6(2), 2011
PMID: 21340033

AUTHOR UNKNOWN, 0

Amplification efficiency of thermostable DNA polymerases.
Arezi B, Xing W, Sorge JA, Hogrefe HH., Anal. Biochem. 321(2), 2003
PMID: 14511688

Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
Chen YC, Liu T, Yu CH, Chiang TY, Hwang CC., PLoS ONE 8(4), 2013
PMID: 23638157

Correcting errors in short reads by multiple alignments.
Salmela L, Schroder J., Bioinformatics 27(11), 2011
PMID: 21471014

Informed and automated k-mer size selection for genome assembly.
Chikhi R, Medvedev P., Bioinformatics 30(1), 2013
PMID: 23732276

Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology.
Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B, Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S, Rothberg JM, Karch H., PLoS ONE 6(7), 2011
PMID: 21799941

“Celbenin” - resistant Staphylococci.
Jevons MP., Br Med J 1(5219), 1961
PMID: PMC1952888

Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology.
van Embden JD, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, Hermans P, Martin C, McAdam R, Shinnick TM., J. Clin. Microbiol. 31(2), 1993
PMID: 8381814

Fast and accurate long-read alignment with Burrows-Wheeler transform.
Li H, Durbin R., Bioinformatics 26(5), 2010
PMID: 20080505

Versatile and open software for comparing large genomes.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL., Genome Biol. 5(2), 2004
PMID: 14759262

AUTHOR UNKNOWN, 0

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 25198770
PubMed | Europe PMC

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld