GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers

Jünemann S, Prior K, Albersmeier A, Albaum S, Kalinowski J, Goesmann A, Stoye J, Harmsen D (2014)
PLOS ONE 9(9).

Journal Article | Published | English

No fulltext has been uploaded

Abstract
De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.
Publishing Year
ISSN
eISSN
PUB-ID

Cite this

Jünemann S, Prior K, Albersmeier A, et al. GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE. 2014;9(9).
Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., et al. (2014). GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE, 9(9).
Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., and Harmsen, D. (2014). GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE 9.
Jünemann, S., et al., 2014. GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE, 9(9).
S. Jünemann, et al., “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”, PLOS ONE, vol. 9, 2014.
Jünemann, S., Prior, K., Albersmeier, A., Albaum, S., Kalinowski, J., Goesmann, A., Stoye, J., Harmsen, D.: GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers. PLOS ONE. 9, (2014).
Jünemann, Sebastian, Prior, Karola, Albersmeier, Andreas, Albaum, Stefan, Kalinowski, Jörn, Goesmann, Alexander, Stoye, Jens, and Harmsen, Dag. “GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers”. PLOS ONE 9.9 (2014).
This data publication is cited in the following publications:
This publication cites the following data publications:

5 Citations in Europe PMC

Data provided by Europe PubMed Central.

Investigating the mobilome in clinically important lineages of Enterococcus faecium and Enterococcus faecalis.
Mikalsen T, Pedersen T, Willems R, Coque TM, Werner G, Sadowy E, van Schaik W, Jensen LB, Sundsfjord A, Hegstad K., BMC Genomics 16(), 2015
PMID: 25885771
Ensuring backwards compatibility: traditional genotyping efforts in the era of whole genome sequencing.
Bletz S, Mellmann A, Rothganger J, Harmsen D., Clin. Microbiol. Infect. 21(4), 2015
PMID: 25658529

44 References

Data provided by Europe PubMed Central.

ARACHNE: a whole-genome shotgun assembler.
Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES., Genome Res. 12(1), 2002
PMID: 11779843
The MaSuRCA genome assembler.
Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA., Bioinformatics 29(21), 2013
PMID: 23990416
The phusion assembler.
Mullikin JC, Ning Z., Genome Res. 13(1), 2003
PMID: 12529309
De novo likelihood-based measures for comparing genome assemblies.
Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M., BMC Res Notes 6(), 2013
PMID: 23965294
QUAST: quality assessment tool for genome assemblies.
Gurevich A, Saveliev V, Vyahhi N, Tesler G., Bioinformatics 29(8), 2013
PMID: 23422339
Error and error mitigation in low-coverage genome assemblies.
Hubisz MJ, Lin MF, Kellis M, Siepel A., PLoS ONE 6(2), 2011
PMID: 21340033

AUTHOR UNKNOWN, 0
Amplification efficiency of thermostable DNA polymerases.
Arezi B, Xing W, Sorge JA, Hogrefe HH., Anal. Biochem. 321(2), 2003
PMID: 14511688
Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
Chen YC, Liu T, Yu CH, Chiang TY, Hwang CC., PLoS ONE 8(4), 2013
PMID: 23638157
Correcting errors in short reads by multiple alignments.
Salmela L, Schroder J., Bioinformatics 27(11), 2011
PMID: 21471014
Informed and automated k-mer size selection for genome assembly.
Chikhi R, Medvedev P., Bioinformatics 30(1), 2014
PMID: 23732276
Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology.
Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B, Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S, Rothberg JM, Karch H., PLoS ONE 6(7), 2011
PMID: 21799941
“Celbenin” - resistant Staphylococci.
Jevons MP., Br Med J 1(5219), 1961
PMID: PMC1952888
Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology.
van Embden JD, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, Hermans P, Martin C, McAdam R, Shinnick TM., J. Clin. Microbiol. 31(2), 1993
PMID: 8381814
Fast and accurate long-read alignment with Burrows-Wheeler transform.
Li H, Durbin R., Bioinformatics 26(5), 2010
PMID: 20080505
Versatile and open software for comparing large genomes.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL., Genome Biol. 5(2), 2004
PMID: 14759262

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 25198770
PubMed | Europe PMC

Search this title in

Google Scholar