Analyzing large scale genomic data on the cloud with Sparkhit

Huang, Liren; Krüger, Jan; Sczyrba, Alexander

Analyzing large scale genomic data on the cloud with Sparkhit

Huang L, Krüger J, Sczyrba A (2018)
Bioinformatics 34(9): 1457-1465.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

URL

https://academic.oup.com/bioinformatics/article/34/9/1457/4747885

DOI

https://doi.org/10.1093/bioinformatics/btx808

Autor*in

Huang, Liren^UniBi; Krüger, Jan^UniBi; Sczyrba, Alexander^UniBi

Einrichtung

Centrum für Biotechnologie > Technologieplattformen > Bielefeld University Bioinformatics Services
Technische Fakultät > Int. Graduiertenkolleg DiDy (GRK 1906)
Centrum für Biotechnologie > Arbeitsgruppe A. Sczyrba
Technische Fakultät > Computational Metagenomics

Erscheinungsjahr

2018

Zeitschriftentitel

Bioinformatics

Band

34

Ausgabe

9

Seite(n)

1457-1465

Urheberrecht / Lizenzen

Creative Commons Namensnennung-Nicht kommerziell 4.0 International (CC BY-NC 4.0)

ISSN

1367-4803

Page URI

https://pub.uni-bielefeld.de/record/2915890

Zitieren

Huang L, Krüger J, Sczyrba A. Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics. 2018;34(9):1457-1465.

Huang, L., Krüger, J., & Sczyrba, A. (2018). Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics, 34(9), 1457-1465. doi:10.1093/bioinformatics/btx808

Huang, Liren, Krüger, Jan, and Sczyrba, Alexander. 2018. “Analyzing large scale genomic data on the cloud with Sparkhit”. Bioinformatics 34 (9): 1457-1465.

Huang, L., Krüger, J., and Sczyrba, A. (2018). Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics 34, 1457-1465.

Huang, L., Krüger, J., & Sczyrba, A., 2018. Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics, 34(9), p 1457-1465.

L. Huang, J. Krüger, and A. Sczyrba, “Analyzing large scale genomic data on the cloud with Sparkhit”, Bioinformatics, vol. 34, 2018, pp. 1457-1465.

Huang, L., Krüger, J., Sczyrba, A.: Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics. 34, 1457-1465 (2018).

Huang, Liren, Krüger, Jan, and Sczyrba, Alexander. “Analyzing large scale genomic data on the cloud with Sparkhit”. Bioinformatics 34.9 (2018): 1457-1465.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Creative Commons Namensnennung-Nicht kommerziell 4.0 International (CC BY-NC 4.0):

https://creativecommons.org/licenses/by-nc/4.0/deed.de
https://creativecommons.org/licenses/by-nc/4.0/legalcode.de

Link(s) zu Volltext(en)

URL

https://academic.oup.com/bioinformatics/article/34/9/1457/4747885

Access Level

Open Access

Daten bereitgestellt von European Bioinformatics Institute (EBI)

1 Zitation in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

gcMeta: a Global Catalogue of Metagenomics platform to support the archiving, standardization and analysis of microbiome data.
Shi W, Qi H, Sun Q, Fan G, Liu S, Wang J, Zhu B, Liu H, Zhao F, Wang X, Hu X, Li W, Liu J, Tian Y, Wu L, Ma J., Nucleic Acids Res 47(d1), 2019
PMID: 30365027

36 References

Daten bereitgestellt von Europe PubMed Central.

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.
Abuin JM, Pichel JC, Pena TF, Amigo J., PLoS ONE 11(5), 2016
PMID: 27182962

A global reference for human genetic variation.
X, Nature 526(7571), 2015
PMID: 26432245

Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing
Bao R.., 2014

Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.
Boisvert S, Laviolette F, Corbeil J., J. Comput. Biol. 17(11), 2010
PMID: 20958248

Near-optimal probabilistic RNA-seq quantification
Bray N.L.., 2016

Chen Y.-T.., 2015

Mapreduce: simplified data processing on large clusters
Dean J., Ghemawat S.., 2008

Halvade: scalable sequence analysis with MapReduce.
Decap D, Reumers J, Herzeel C, Costanza P, Fostier J., Bioinformatics 31(15), 2015
PMID: 25819078

qsubsec: a lightweight template system for defining sun grid engine workflows.
Droop AP., Bioinformatics 32(8), 2015
PMID: 26635140

Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs.
Eloe-Fadrosh EA, Paez-Espino D, Jarett J, Dunfield PF, Hedlund BP, Dekas AE, Grasby SE, Brady AL, Dong H, Briggs BR, Li WJ, Goudeau D, Malmstrom R, Pati A, Pett-Ridge J, Rubin EM, Woyke T, Kyrpides NC, Ivanova NN., Nat Commun 7(), 2016
PMID: 26814032

A high-performance, portable implementation of the mpi message passing interface standard
Gropp W.., 1996

Aligning short sequencing reads with bowtie
Langmead B.., 2010

Fast gapped-read alignment with Bowtie 2.
Langmead B, Salzberg SL., Nat. Methods 9(4), 2012
PMID: 22388286

Searching for SNPs with cloud computing.
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL., Genome Biol. 10(11), 2009
PMID: 19930550

Cloud-scale RNA-sequencing differential expression analysis with Myrna.
Langmead B, Hansen KD, Leek JT., Genome Biol. 11(8), 2010
PMID: 20701754

Fast and accurate short read alignment with Burrows-Wheeler transform.
Li H, Durbin R., Bioinformatics 25(14), 2009
PMID: 19451168

The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup., Bioinformatics 25(16), 2009
PMID: 19505943

SOAP: short oligonucleotide alignment program.
Li R, Li Y, Kristiansen K, Wang J., Bioinformatics 24(5), 2008
PMID: 18227114

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA., Genome Res. 20(9), 2010
PMID: 20644199

Docker: lightweight linux containers for consistent development and deployment
Merkel D.., 2014

FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes.
Niu B, Zhu Z, Fu L, Wu S, Li W., Bioinformatics 27(12), 2011
PMID: 21505035

The NIH Human Microbiome Project.
NIH HMP Working Group, Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, Bonazzi V, McEwen JE, Wetterstrand KA, Deal C, Baker CC, Di Francesco V, Howcroft TK, Karp RW, Lunsford RD, Wellington CR, Belachew T, Wright M, Giblin C, David H, Mills M, Salomon R, Mullins C, Akolkar B, Begg L, Davis C, Grandison L, Humble M, Khalsa J, Little AR, Peavy H, Pontzer C, Portnoy M, Sayre MH, Starke-Reed P, Zakhari S, Read J, Watson B, Guyer M., Genome Res. 19(12), 2009
PMID: 19819907

The 3,000 rice genomes project.
3,000 rice genomes project., Gigascience 3(), 2014
PMID: 24872877

Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany.
Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F, Paxinos EE, Sebra R, Chin CS, Iliopoulos D, Klammer A, Peluso P, Lee L, Kislyuk AO, Bullard J, Kasarskis A, Wang S, Eid J, Rank D, Redman JC, Steyert SR, Frimodt-Moller J, Struve C, Petersen AM, Krogfelt KA, Nataro JP, Schadt EE, Waldor MK., N. Engl. J. Med. 365(8), 2011
PMID: 21793740

Insights into the phylogeny and coding potential of microbial dark matter.
Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA, Dodsworth JA, Hedlund BP, Tsiamis G, Sievert SM, Liu WT, Eisen JA, Hallam SJ, Kyrpides NC, Stepanauskas R, Rubin EM, Hugenholtz P, Woyke T., Nature 499(7459), 2013
PMID: 23851394

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcon LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC., PLoS Biol. 5(3), 2007
PMID: 17355176

Computational solutions to large-scale data management and analysis.
Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP., Nat. Rev. Genet. 11(9), 2010
PMID: 20717155

CloudBurst: highly sensitive read mapping with MapReduce.
Schatz MC., Bioinformatics 25(11), 2009
PMID: 19357099

Shvachko K.., 2010

ABySS: a parallel assembler for short read sequence data.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I., Genome Res. 19(6), 2009
PMID: 19251739

Kraken: ultrafast metagenomic sequence classification using exact alignments.
Wood DE, Salzberg SL., Genome Biol. 15(3), 2014
PMID: 24580807

Heterogeneity in the inter-tumor transcriptome of high risk prostate cancer.
Wyatt AW, Mo F, Wang K, McConeghy B, Brahmbhatt S, Jong L, Mitchell DM, Johnston RL, Haegert A, Li E, Liew J, Yeung J, Shrestha R, Lapuk AV, McPherson A, Shukin R, Bell RH, Anderson S, Bishop J, Hurtado-Coll A, Xiao H, Chinnaiyan AM, Mehra R, Lin D, Wang Y, Fazli L, Gleave ME, Volik SV, Collins CC., Genome Biol. 15(8), 2014
PMID: 25155515

Zaharia M.., 2012

Zhao G.., 2015

MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.
Zhou W, Li R, Yuan S, Liu C, Yao S, Luo J, Niu B., Bioinformatics 33(7), 2017
PMID: 28065898

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.
Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M., Nat. Biotechnol. 32(3), 2014
PMID: 24531798

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB