Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads

Luo X (2023)
Bielefeld: Universität Bielefeld.

Bielefelder E-Dissertation | Englisch
 
Download
OA 15.11 MB
Autor*in
Gutachter*in / Betreuer*in
Abstract / Bemerkung
Genomes of the same species usually show high diversity due to genetic variations. Most eukaryotes involve multiple copies of genomes, each of which is inherited from one of the ancestors. Whereas prokaryotes, such as virus and bacteria, are prone to form a collection of closely related strains which differ by small amounts of variants in the infected hosts or environment. Determining the DNA sequence of each genome copy and strain (together are called haplotype here), which is referred to as haplotype-aware genome assembly, plays a crucial role in genomics, medicine, and many other disciplines, because biological functionalities or phenotypic appearance can differ substantially across different haplotypes. Over the last few years, long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore Technologies, have greatly improved haplotype phasing and genome assembly because of the tremendous advantages in terms of read length. Nevertheless, the vast majority of existing long-read haplotype phasing or genome assembly tools rely on a pre-known high-quality reference as a backbone, or collapse homologous sequences into one consensus sequence, respectively, which introduces disturbing biases or fails to capture the haplotype diversity of heterogeneous genomes.

In this dissertation, several novel computational approaches are proposed to haplotype-aware de novo genome assembly by only using long-read sequencing data, which are particularly applied in diploid genome, viral quasispecies, polyploid genome and metagenome, respectively. Briefly, these methods firstly perform haplotype-aware sequencing error correction using different strategies, and subsequently construct haplotype-aware sequence graphs to generate haplotype-specific assemblies, which together enable to avoid reference biases on one hand and preserve haplotype diversity on the other. Benchmarking experiments on simulated and real datasets of varying complexity and diversity demonstrate that these methods drastically outperforms other state-of-the-art tools in terms of various evaluation criteria related to genome assembly, especially haplotype completeness and accuracy. In addition, these approaches presented in the thesis have been implemented as easy-to-use open-source tools.
Jahr
2023
Seite(n)
202
Page URI
https://pub.uni-bielefeld.de/record/2969798

Zitieren

Luo X. Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads. Bielefeld: Universität Bielefeld; 2023.
Luo, X. (2023). Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/2969798
Luo, Xiao. 2023. Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads. Bielefeld: Universität Bielefeld.
Luo, X. (2023). Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads. Bielefeld: Universität Bielefeld.
Luo, X., 2023. Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads, Bielefeld: Universität Bielefeld.
X. Luo, Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads, Bielefeld: Universität Bielefeld, 2023.
Luo, X.: Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads. Universität Bielefeld, Bielefeld (2023).
Luo, Xiao. Computational Methods for Haplotype-aware De Novo Genome Assembly from Long Reads. Bielefeld: Universität Bielefeld, 2023.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2023-04-25T11:53:32Z
MD5 Prüfsumme
9a416f8fdf8fa6759b14090932bf82b6


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar