Strain-aware assembly of genomes from mixed samples using flow variation graphs

Baaijens JA, Stougie L, Schönhuth A (2019)
bioRxiv.

Preprint | Veröffentlicht | Englisch
 
Download
Es wurde kein Volltext hochgeladen. Nur Publikationsnachweis!
Autor*in
Baaijens, Jasmijn A.; Stougie, Leen; Schönhuth, AlexanderUniBi
Abstract / Bemerkung
AbstractThe goal of strain-aware genome assembly is to reconstruct all individual haplotypes from a mixed sample at the strain level and to provide abundance estimates for the strains. Given that the use of a reference genome can introduce significant biases, de novo approaches are most suitable for this task. So far, reference-genome-independent assemblers have been shown to reconstruct haplotypes for mixed samples of limited complexity and genomes not exceeding 10000 bp in length.Here, we present VG-Flow, a de novo approach that enables full-length haplotype reconstruction from pre-assembled contigs of complex mixed samples. Our method increases contiguity of the input assembly and, at the same time, it performs haplotype abundance estimation. VG-Flow is the first approach to require polynomial, and not exponential runtime in terms of the underlying graphs. Since runtime increases only linearly in the length of the genomes in practice, it enables the reconstruction also of genomes that are longer by orders of magnitude, thereby establishing the first de novo solution to strain-aware full-length genome assembly applicable to bacterial sized genomes.VG-Flow is based on the flow variation graph as a novel concept that both captures all diversity present in the sample and enables to cast the central contig abundance estimation problem as a flow-like, polynomial time solvable optimization problem. As a consequence, we are in position to compute maximal-length haplotypes in terms of decomposing the resulting flow efficiently using a greedy algorithm, and obtain accurate frequency estimates for the reconstructed haplotypes through linear programming techniques.Benchmarking experiments show that our method outperforms state-of-the-art approaches on mixed samples from short genomes in terms of assembly accuracy as well as abundance estimation. Experiments on longer, bacterial sized genomes demonstrate that VG-Flow is the only current approach that can reconstruct full-length haplotypes from mixed samples at the strain level in human-affordable runtime.
Erscheinungsjahr
2019
Zeitschriftentitel
bioRxiv
Page URI
https://pub.uni-bielefeld.de/record/2941763

Zitieren

Baaijens JA, Stougie L, Schönhuth A. Strain-aware assembly of genomes from mixed samples using flow variation graphs. bioRxiv. 2019.
Baaijens, J. A., Stougie, L., & Schönhuth, A. (2019). Strain-aware assembly of genomes from mixed samples using flow variation graphs. bioRxiv
Baaijens, J. A., Stougie, L., and Schönhuth, A. (2019). Strain-aware assembly of genomes from mixed samples using flow variation graphs. bioRxiv.
Baaijens, J.A., Stougie, L., & Schönhuth, A., 2019. Strain-aware assembly of genomes from mixed samples using flow variation graphs. bioRxiv.
J.A. Baaijens, L. Stougie, and A. Schönhuth, “Strain-aware assembly of genomes from mixed samples using flow variation graphs”, bioRxiv, 2019.
Baaijens, J.A., Stougie, L., Schönhuth, A.: Strain-aware assembly of genomes from mixed samples using flow variation graphs. bioRxiv. (2019).
Baaijens, Jasmijn A., Stougie, Leen, and Schönhuth, Alexander. “Strain-aware assembly of genomes from mixed samples using flow variation graphs”. bioRxiv (2019).

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Quellen

bioRxiv: 10.1101/645721

Suchen in

Google Scholar