GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers

Jünemann S, Prior K, Albersmeier A, Albaum S, Kalinowski J, Goesmann A, Stoye J, Harmsen D (2014)
PLOS ONE 9(9): e107014.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Abstract / Bemerkung
De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at
Page URI


Jünemann S, Prior K, Albersmeier A, et al. GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE. 2014;9(9): e107014.
Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., et al. (2014). GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE, 9(9), e107014. doi:10.1371/journal.pone.0107014
Jünemann, Sebastian, Prior, Karola, Albersmeier, Andreas, Albaum, Stefan, Kalinowski, Jörn, Goesmann, Alexander, Stoye, Jens, and Harmsen, Dag. 2014. “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”. PLOS ONE 9 (9): e107014.
Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., and Harmsen, D. (2014). GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE 9:e107014.
Jünemann, S., et al., 2014. GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE, 9(9): e107014.
S. Jünemann, et al., “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”, PLOS ONE, vol. 9, 2014, : e107014.
Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., Harmsen, D.: GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE. 9, : e107014 (2014).
Jünemann, Sebastian, Prior, Karola, Albersmeier, Andreas, Albaum, Stefan, Kalinowski, Jörn, Goesmann, Alexander, Stoye, Jens, and Harmsen, Dag. “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”. PLOS ONE 9.9 (2014): e107014.

11 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

High Interlaboratory Reproducibility and Accuracy of Next-Generation-Sequencing-Based Bacterial Genotyping in a Ring Trial.
Mellmann A, Andersen PS, Bletz S, Friedrich AW, Kohl TA, Lilje B, Niemann S, Prior K, Rossen JW, Harmsen D., J Clin Microbiol 55(3), 2017
PMID: 28053217
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses.
Dimitrov KM, Sharma P, Volkening JD, Goraichuk IV, Wajid A, Rehmani SF, Basharat A, Shittu I, Joannis TM, Miller PJ, Afonso CL., Virol J 14(1), 2017
PMID: 28388925
MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.
Lugli GA, Milani C, Mancabelli L, van Sinderen D, Ventura M., FEMS Microbiol Lett 363(7), 2016
PMID: 26936607
A Primer on Infectious Disease Bacterial Genomics.
Lynch T, Petkau A, Knox N, Graham M, Van Domselaar G., Clin Microbiol Rev 29(4), 2016
PMID: 28590251
Ensuring backwards compatibility: traditional genotyping efforts in the era of whole genome sequencing.
Bletz S, Mellmann A, Rothgänger J, Harmsen D., Clin Microbiol Infect 21(4), 2015
PMID: 25658529
An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology.
Li J, Batcha AM, Grüning B, Mansmann UR., Cancer Inform 14(suppl 5), 2015
PMID: 27081306
Investigating the mobilome in clinically important lineages of Enterococcus faecium and Enterococcus faecalis.
Mikalsen T, Pedersen T, Willems R, Coque TM, Werner G, Sadowy E, van Schaik W, Jensen LB, Sundsfjord A, Hegstad K., BMC Genomics 16(), 2015
PMID: 25885771

44 References

Daten bereitgestellt von Europe PubMed Central.

Performance comparison of benchtop high-throughput sequencing platforms.
Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ., Nat. Biotechnol. 30(5), 2012
PMID: 22522955
Updating benchtop sequencing performance comparison.
Junemann S, Sedlazeck FJ, Prior K, Albersmeier A, John U, Kalinowski J, Mellmann A, Goesmann A, von Haeseler A, Stoye J, Harmsen D., Nat. Biotechnol. 31(4), 2013
PMID: 23563421
How to apply de Bruijn graphs to genome assembly.
Compeau PE, Pevzner PA, Tesler G., Nat. Biotechnol. 29(11), 2011
PMID: 22068540
Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph.
Li Z, Chen Y, Mu D, Yuan J, Shi Y, Zhang H, Gan J, Li N, Hu X, Liu B, Yang B, Fan W., Brief Funct Genomics 11(1), 2011
PMID: 22184334
Sequence assembly demystified.
Nagarajan N, Pop M., Nat. Rev. Genet. 14(3), 2013
PMID: 23358380
Sequence assembly Computational biology and chemistry
De novo assembly of short sequence reads.
Paszkiewicz K, Studholme DJ., Brief. Bioinformatics 11(5), 2010
PMID: 20724458
A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.
Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B., PLoS ONE 6(3), 2011
PMID: 21423806
Comparative studies of de novo assembly tools for next-generation sequencing technologies.
Lin Y, Li J, Shen H, Zhang L, Papasian CJ, Deng HW., Bioinformatics 27(15), 2011
PMID: 21636596
Comparing de novo genome assembly: the long and short of it.
Narzisi G, Mishra B., PLoS ONE 6(4), 2011
PMID: 21559467
Assemblathon 1: a competitive assessment of de novo short read assembly methods.
Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol I, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B., Genome Res. 21(12), 2011
PMID: 21926179
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou WC, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis E, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam TW, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, Maccallum I, Macmanes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu SM, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF., Gigascience 2(1), 2013
PMID: 23870653
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marcais G, Pop M, Yorke JA., Genome Res. 22(3), 2012
PMID: 22147368
GAGE-B: an evaluation of genome assemblers for bacterial organisms.
Magoc T, Pabinger S, Canzar S, Liu X, Su Q, Puiu D, Tallon LJ, Salzberg SL., Bioinformatics 29(14), 2013
PMID: 23665771
ABySS: a parallel assembler for short read sequence data.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I., Genome Res. 19(6), 2009
PMID: 19251739
Aggressive assembly of pyrosequencing reads with mates.
Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G., Bioinformatics 24(24), 2008
PMID: 18952627


Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S., Genome Res. 14(6), 2004
PMID: 15140833

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J., Gigascience 1(1), 2012
PMID: 23587118
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA., J. Comput. Biol. 19(5), 2012
PMID: 22506599
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
Zerbino DR, Birney E., Genome Res. 18(5), 2008
PMID: 18349386
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.
Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB., Proc. Natl. Acad. Sci. U.S.A. 108(4), 2010
PMID: 21187386
ARACHNE: a whole-genome shotgun assembler.
Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES., Genome Res. 12(1), 2002
PMID: 11779843
The MaSuRCA genome assembler.
Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA., Bioinformatics 29(21), 2013
PMID: 23990416
The phusion assembler.
Mullikin JC, Ning Z., Genome Res. 13(1), 2003
PMID: 12529309
De novo likelihood-based measures for comparing genome assemblies.
Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M., BMC Res Notes 6(), 2013
PMID: 23965294
QUAST: quality assessment tool for genome assemblies.
Gurevich A, Saveliev V, Vyahhi N, Tesler G., Bioinformatics 29(8), 2013
PMID: 23422339
Error and error mitigation in low-coverage genome assemblies.
Hubisz MJ, Lin MF, Kellis M, Siepel A., PLoS ONE 6(2), 2011
PMID: 21340033

Amplification efficiency of thermostable DNA polymerases.
Arezi B, Xing W, Sorge JA, Hogrefe HH., Anal. Biochem. 321(2), 2003
PMID: 14511688
Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
Chen YC, Liu T, Yu CH, Chiang TY, Hwang CC., PLoS ONE 8(4), 2013
PMID: 23638157
Correcting errors in short reads by multiple alignments.
Salmela L, Schroder J., Bioinformatics 27(11), 2011
PMID: 21471014
Informed and automated k-mer size selection for genome assembly.
Chikhi R, Medvedev P., Bioinformatics 30(1), 2013
PMID: 23732276
Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology.
Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B, Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S, Rothberg JM, Karch H., PLoS ONE 6(7), 2011
PMID: 21799941
“Celbenin” - resistant Staphylococci.
Jevons MP., Br Med J 1(5219), 1961
PMID: PMC1952888
Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology.
van Embden JD, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, Hermans P, Martin C, McAdam R, Shinnick TM., J. Clin. Microbiol. 31(2), 1993
PMID: 8381814
Fast and accurate long-read alignment with Burrows-Wheeler transform.
Li H, Durbin R., Bioinformatics 26(5), 2010
PMID: 20080505
Versatile and open software for comparing large genomes.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL., Genome Biol. 5(2), 2004
PMID: 14759262



Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 25198770
PubMed | Europe PMC

Suchen in

Google Scholar