Dynamic Alignment-Free and Reference-Free Read Compression

Holley G, Wittler R, Stoye J, Hach F (2018)
JOURNAL OF COMPUTATIONAL BIOLOGY 25(7): 825-836.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurde kein Volltext hochgeladen. Nur Publikationsnachweis!
Abstract / Bemerkung
The advent of high throughput sequencing (HTS) technologies raises a major concern about storage and transmission of data produced by these technologies. In particular, large-scale sequencing projects generate an unprecedented volume of genomic sequences ranging from tens to several thousands of genomes per species. These collections contain highly similar and redundant sequences, also known as pangenomes. The ideal way to represent and transfer pangenomes is through compression. A number of HTS-specific compression tools have been developed to reduce the storage and communication costs of HTS data, yet none of them is designed to process a pangenome. In this article, we present dynamic alignment-free and reference-free read compression (DARRC), a new alignment-free and reference-free compression method. It addresses the problem of pangenome compression by encoding the sequences of a pangenome as a guided de Bruijn graph. The novelty of this method is its ability to incrementally update DARRC archives with new genome sequences without full decompression of the archive. DARRC can compress both single-end and paired-end read sequences of any length using all symbols of the IUPAC nucleotide code. On a large Pseudomonas aeruginosa data set, our method outperforms all other tested tools. It provides a 30% compression ratio improvement in single-end mode compared with the best performing state-of-the-art HTS-specific compression method in our experiments.
Stichworte
guided de Bruijn graph; high throughput sequencing; sequence compression
Erscheinungsjahr
2018
Zeitschriftentitel
JOURNAL OF COMPUTATIONAL BIOLOGY
Band
25
Ausgabe
7
Seite(n)
825-836
Konferenz
21st Annual International Conference on Research in Computational Molecular Biology (RECOMB)
Konferenzort
Hong Kong, HONG KONG
ISSN
1066-5277
eISSN
1557-8666
Page URI
https://pub.uni-bielefeld.de/record/2930268

Zitieren

Holley G, Wittler R, Stoye J, Hach F. Dynamic Alignment-Free and Reference-Free Read Compression. JOURNAL OF COMPUTATIONAL BIOLOGY. 2018;25(7):825-836.
Holley, G., Wittler, R., Stoye, J., & Hach, F. (2018). Dynamic Alignment-Free and Reference-Free Read Compression. JOURNAL OF COMPUTATIONAL BIOLOGY, 25(7), 825-836. doi:10.1089/cmb.2018.0068
Holley, G., Wittler, R., Stoye, J., and Hach, F. (2018). Dynamic Alignment-Free and Reference-Free Read Compression. JOURNAL OF COMPUTATIONAL BIOLOGY 25, 825-836.
Holley, G., et al., 2018. Dynamic Alignment-Free and Reference-Free Read Compression. JOURNAL OF COMPUTATIONAL BIOLOGY, 25(7), p 825-836.
G. Holley, et al., “Dynamic Alignment-Free and Reference-Free Read Compression”, JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 25, 2018, pp. 825-836.
Holley, G., Wittler, R., Stoye, J., Hach, F.: Dynamic Alignment-Free and Reference-Free Read Compression. JOURNAL OF COMPUTATIONAL BIOLOGY. 25, 825-836 (2018).
Holley, Guillaume, Wittler, Roland, Stoye, Jens, and Hach, Faraz. “Dynamic Alignment-Free and Reference-Free Read Compression”. JOURNAL OF COMPUTATIONAL BIOLOGY 25.7 (2018): 825-836.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 30011247
PubMed | Europe PMC

Suchen in

Google Scholar