Eskemap: exact sketch-based read mapping
Schulz T, Medvedev P (2024)
Algorithms for Molecular Biology 2024(19): 19.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
s13015-024-00261-7.pdf
2.19 MB
Autor*in
Schulz, TizianUniBi ;
Medvedev, Paul
Einrichtung
Abstract / Bemerkung
Background: Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a “similar sequence”. Traditionally, “similar sequence” was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold.
Results: In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in O(|t| + |p| + ℓ2) time and O(ℓ log ℓ) space, where |t| is the number of k-mers inside the sketch of the reference, |p| is the number of k-mers inside the read’s sketch and ℓ is the number of times that k-mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm’s performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.
Stichworte
Sequence sketching;
Long-read mapping;
Exact algorithm;
Dynamic programming
Erscheinungsjahr
2024
Zeitschriftentitel
Algorithms for Molecular Biology
Band
2024
Ausgabe
19
Art.-Nr.
19
Urheberrecht / Lizenzen
eISSN
1748-7188
Finanzierungs-Informationen
Open-Access-Publikationskosten wurden durch die Universität Bielefeld im Rahmen des DEAL-Vertrags gefördert.
Page URI
https://pub.uni-bielefeld.de/record/2989197
Zitieren
Schulz T, Medvedev P. Eskemap: exact sketch-based read mapping. Algorithms for Molecular Biology. 2024;2024(19): 19.
Schulz, T., & Medvedev, P. (2024). Eskemap: exact sketch-based read mapping. Algorithms for Molecular Biology, 2024(19), 19. https://doi.org/10.1186/s13015-024-00261-7
Schulz, Tizian, and Medvedev, Paul. 2024. “Eskemap: exact sketch-based read mapping”. Algorithms for Molecular Biology 2024 (19): 19.
Schulz, T., and Medvedev, P. (2024). Eskemap: exact sketch-based read mapping. Algorithms for Molecular Biology 2024:19.
Schulz, T., & Medvedev, P., 2024. Eskemap: exact sketch-based read mapping. Algorithms for Molecular Biology, 2024(19): 19.
T. Schulz and P. Medvedev, “Eskemap: exact sketch-based read mapping”, Algorithms for Molecular Biology, vol. 2024, 2024, : 19.
Schulz, T., Medvedev, P.: Eskemap: exact sketch-based read mapping. Algorithms for Molecular Biology. 2024, : 19 (2024).
Schulz, Tizian, and Medvedev, Paul. “Eskemap: exact sketch-based read mapping”. Algorithms for Molecular Biology 2024.19 (2024): 19.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
s13015-024-00261-7.pdf
2.19 MB
Access Level
Open Access
Zuletzt Hochgeladen
2024-05-08T15:07:17Z
MD5 Prüfsumme
285cdad6ac73be9526816309232552ab
Daten bereitgestellt von European Bioinformatics Institute (EBI)
Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
References
Daten bereitgestellt von Europe PubMed Central.
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 38704605
PubMed | Europe PMC
Suchen in