Sparkhit evaluation data set

Huang L, Krüger J, Sczyrba A (2017) : Bielefeld University. doi:10.4119/unibi/2914921.

Download
OA
OA Simulated 150nt reads using 36MB reference genomes
OA Simulated 100nt reads using 72MB reference genomes
Alle
Datenpublikation
Daten vorhanden für diesen Nachweis
Abstract / Bemerkung
Motivation: The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform.
Results: Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92 to 157 times faster than MetaSpark on metagenomic fragment recruitment and 18 to 32 times faster than Crossbow on data pre-processing. We evaluated sensitivity and accuracy with the simulated data provided here.
Erscheinungsjahr
Data Re-Use License
This Sparkhit evaluation data set is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://opendatacommons.org/licenses/pddl/1.0
PUB-ID

Zitieren

Huang L, Krüger J, Sczyrba A. (2017): Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921.
Huang, L., Krüger, J., & Sczyrba, A. (2017). Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921
Huang, L., Krüger, J., and Sczyrba, A. (2017). Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921.
Huang, L., Krüger, J., & Sczyrba, A., 2017. Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921
L. Huang, J. Krüger, and A. Sczyrba, Sparkhit evaluation data set. Bielefeld University, 2017. doi:10.4119/unibi/2914921.
Huang, L., Krüger, J., Sczyrba, A.: Sparkhit evaluation data set. Bielefeld University (2017). doi:10.4119/unibi/2914921.
Huang, Liren, Krüger, Jan, and Sczyrba, Alexander. Sparkhit evaluation data set. Bielefeld University, 2017. doi:10.4119/unibi/2914921
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Titel
Simulated 100nt reads using 36MB reference genomes
Beschreibung
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single36m100nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2017-11-07T13:59:24Z
Titel
Simulated 150nt reads using 36MB reference genomes
Beschreibung
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single36m150nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2017-11-07T13:59:24Z
Titel
Simulated 100nt reads using 72MB reference genomes
Beschreibung
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single72m100nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2017-11-07T13:59:24Z
Titel
Simulated 150nt reads using 72MB reference genomes
Beschreibung
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single72m150nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2017-11-07T13:59:24Z
Titel
Simulated 100nt reads using 142MB reference genomes
Beschreibung
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single142m100nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2017-11-07T13:59:24Z
Titel
Simulated 150nt reads using 142MB reference genomes
Beschreibung
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single142m150nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2017-11-07T13:59:24Z

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar