Sparkhit evaluation data set

Huang L, Krüger J, Sczyrba A (2017)
Bielefeld University.

Datenpublikation
 
Download
OA
OA Simulated 150nt reads using 36MB reference genomes
OA Simulated 100nt reads using 72MB reference genomes
Alle
Abstract / Bemerkung
Motivation: The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform.
Results: Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92 to 157 times faster than MetaSpark on metagenomic fragment recruitment and 18 to 32 times faster than Crossbow on data pre-processing. We evaluated sensitivity and accuracy with the simulated data provided here.
Erscheinungsjahr
2017
Page URI
https://pub.uni-bielefeld.de/record/2914921

Zitieren

Huang L, Krüger J, Sczyrba A. Sparkhit evaluation data set. Bielefeld University; 2017.
Huang, L., Krüger, J., & Sczyrba, A. (2017). Sparkhit evaluation data set. Bielefeld University. doi:10.4119/unibi/2914921
Huang, Liren, Krüger, Jan, and Sczyrba, Alexander. 2017. Sparkhit evaluation data set. Bielefeld University.
Huang, L., Krüger, J., and Sczyrba, A. (2017). Sparkhit evaluation data set. Bielefeld University.
Huang, L., Krüger, J., & Sczyrba, A., 2017. Sparkhit evaluation data set, Bielefeld University.
L. Huang, J. Krüger, and A. Sczyrba, Sparkhit evaluation data set, Bielefeld University, 2017.
Huang, L., Krüger, J., Sczyrba, A.: Sparkhit evaluation data set. Bielefeld University (2017).
Huang, Liren, Krüger, Jan, and Sczyrba, Alexander. Sparkhit evaluation data set. Bielefeld University, 2017.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Titel
Simulated 100nt reads using 36MB reference genomes
Beschreibung
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single36m100nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:53Z
MD5 Prüfsumme
77910be1aa6a617dd1c78be89b148dc3
Titel
Simulated 150nt reads using 36MB reference genomes
Beschreibung
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single36m150nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:53Z
MD5 Prüfsumme
8595adb61016f3f2674f798da3e0afe5
Titel
Simulated 100nt reads using 72MB reference genomes
Beschreibung
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single72m100nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:53Z
MD5 Prüfsumme
f361a79c1eb5521440940c0817aedf3f
Titel
Simulated 150nt reads using 72MB reference genomes
Beschreibung
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single72m150nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:53Z
MD5 Prüfsumme
5dc857a548f98c6482f706d3c696e88b
Titel
Simulated 100nt reads using 142MB reference genomes
Beschreibung
100nt sequencing data was simulated with art_illumina's Hiseq-2000 profile art_illumina -sam -i reference.fa -l 100 -ss HS20 -f 50 -o single142m100nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:53Z
MD5 Prüfsumme
5acf626654e6f971a0b02d6084131c1d
Titel
Simulated 150nt reads using 142MB reference genomes
Beschreibung
150nt sequencing data was simulated with art_illumina's Hiseq-2500 profile art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 50 -o single142m150nt
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:53Z
MD5 Prüfsumme
0cffdff78c7d8f70ddc655036f94743c


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar