Sparkhit evaluation data set

Huang L, Krüger J, Sczyrba A (2017) : Bielefeld University. doi:10.4119/unibi/2914921.

Download
OA
OA Simulated 150nt reads using 36MB reference genomes
OA Simulated 100nt reads using 72MB reference genomes
All
Research Data
Abstract / Notes
Motivation: The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform.
Results: Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92 to 157 times faster than MetaSpark on metagenomic fragment recruitment and 18 to 32 times faster than Crossbow on data pre-processing. We evaluated sensitivity and accuracy with the simulated data provided here.
Publishing Year
Data Re-Use License
This Sparkhit evaluation data set is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://opendatacommons.org/licenses/pddl/1.0
PUB-ID

Cite this

Huang L, Krüger J, Sczyrba A. (2017): Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921.
Huang, L., Krüger, J., & Sczyrba, A. (2017). Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921
Huang, L., Krüger, J., and Sczyrba, A. (2017). Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921.
Huang, L., Krüger, J., & Sczyrba, A., 2017. Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921
L. Huang, J. Krüger, and A. Sczyrba, Sparkhit evaluation data set. Bielefeld University, 2017. doi:10.4119/unibi/2914921.
Huang, L., Krüger, J., Sczyrba, A.: Sparkhit evaluation data set. Bielefeld University (2017). doi:10.4119/unibi/2914921.
Huang, Liren, Krüger, Jan, and Sczyrba, Alexander. Sparkhit evaluation data set. Bielefeld University, 2017. doi:10.4119/unibi/2914921
All files available under the following license(s):
Main File(s)
File Title
Simulated 100nt reads using 36MB reference genomes
Description
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single36m100nt
Access Level
OA Open Access
Last Uploaded
2017-11-07T13:59:24Z
File Title
Simulated 150nt reads using 36MB reference genomes
Description
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single36m150nt
Access Level
OA Open Access
Last Uploaded
2017-11-07T13:59:24Z
File Title
Simulated 100nt reads using 72MB reference genomes
Description
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single72m100nt
Access Level
OA Open Access
Last Uploaded
2017-11-07T13:59:24Z
File Title
Simulated 150nt reads using 72MB reference genomes
Description
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single72m150nt
Access Level
OA Open Access
Last Uploaded
2017-11-07T13:59:24Z
File Title
Simulated 100nt reads using 142MB reference genomes
Description
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single142m100nt
Access Level
OA Open Access
Last Uploaded
2017-11-07T13:59:24Z
File Title
Simulated 150nt reads using 142MB reference genomes
Description
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single142m150nt
Access Level
OA Open Access
Last Uploaded
2017-11-07T13:59:24Z

This data publication is cited in the following publications:
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar