Efficient q-gram filters for finding all epsilon-matches over a given length

Rasmussen KR, Stoye J, Myers EW (2006)
JOURNAL OF COMPUTATIONAL BIOLOGY 13(2): 296-308.

Download
No fulltext has been uploaded. References only!
Journal Article | Original Article | Published | English

No fulltext has been uploaded

Author
; ;
Abstract
Fast and exact comparison of large genomic sequences remains a challenging task in biosequence analysis. We consider the problem of finding all epsilon-matches between two sequences, i.e., all local alignments over a given length with an error rate of at most epsilon. We study this problem theoretically, giving an efficient q-gram filter for solving it. Two applications of the filter are also discussed, in particular genomic sequence assembly and BLAST-like sequence comparison. Our results show that the method is 25 times faster than BLAST, while not being heuristic.
Publishing Year
ISSN
eISSN
PUB-ID

Cite this

Rasmussen KR, Stoye J, Myers EW. Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY. 2006;13(2):296-308.
Rasmussen, K. R., Stoye, J., & Myers, E. W. (2006). Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY, 13(2), 296-308. doi:10.1089/cmb.2006.13.296
Rasmussen, K. R., Stoye, J., and Myers, E. W. (2006). Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY 13, 296-308.
Rasmussen, K.R., Stoye, J., & Myers, E.W., 2006. Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY, 13(2), p 296-308.
K.R. Rasmussen, J. Stoye, and E.W. Myers, “Efficient q-gram filters for finding all epsilon-matches over a given length”, JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 13, 2006, pp. 296-308.
Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY. 13, 296-308 (2006).
Rasmussen, Kim R., Stoye, Jens, and Myers, Eugene W. “Efficient q-gram filters for finding all epsilon-matches over a given length”. JOURNAL OF COMPUTATIONAL BIOLOGY 13.2 (2006): 296-308.
This data publication is cited in the following publications:
This publication cites the following data publications:

36 Citations in Europe PMC

Data provided by Europe PubMed Central.

Linking temporal medical records using non-protected health information data.
Bonomi L, Jiang X., Stat Methods Med Res (), 2017
PMID: 29298592
Systematic comparative study of computational methods for T-cell receptor sequencing data analysis.
Afzal S, Gil-Farina I, Gabriel R, Ahmad S, von Kalle C, Schmidt M, Fronza R., Brief. Bioinformatics (), 2017
PMID: 29028876
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Alser M, Hassan H, Xin H, Ergin O, Mutlu O, Alkan C., Bioinformatics 33(21), 2017
PMID: 28575161
Short Read Mapping: An Algorithmic Tour.
Canzar S, Salzberg SL., Proc IEEE Inst Electr Electron Eng 105(3), 2017
PMID: 28502990
Circular sequence comparison: algorithms and applications.
Grossi R, Iliopoulos CS, Mercas R, Pisanti N, Pissis SP, Retha A, Vayani F., Algorithms Mol Biol 11(), 2016
PMID: 27168761
rHAT: fast alignment of noisy long reads with regional hashing.
Liu B, Guan D, Teng M, Wang Y., Bioinformatics 32(11), 2016
PMID: 26568628
Optimal seed solver: optimizing seed selection in read mapping.
Xin H, Nahar S, Zhu R, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O., Bioinformatics 32(11), 2016
PMID: 26568624
BitMapper: an efficient all-mapper based on bit-vector computing.
Cheng H, Jiang H, Yang J, Xu Y, Shang Y., BMC Bioinformatics 16(), 2015
PMID: 26063651
Normalized N50 assembly metric using gap-restricted co-linear chaining.
Makinen V, Salmela L, Ylinen J., BMC Bioinformatics 13(), 2012
PMID: 23031320
Lossless filter for multiple repeats with bounded edit distance.
Peterlongo P, Sacomoto GA, do Lago AP, Pisanti N, Sagot MF., Algorithms Mol Biol 4(), 2009
PMID: 19183438

14 References

Data provided by Europe PubMed Central.

Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Approximate string-matching with q-grams and maximal matches
Ukkonen, Theoretical Computer Science 92(1), 1992
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694
A sublinear algorithm for approximate keyword searching
Myers, Algorithmica 12(4-5), 1994
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
PatternHunter: faster and more sensitive homology search.
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743
BLAT--the BLAST-like alignment tool.
Kent WJ., Genome Res. 12(4), 2002
PMID: 11932250
SSAHA: a fast search method for large DNA databases.
Ning Z, Cox AJ, Mullikin JC., Genome Res. 11(10), 2001
PMID: 11591649
Sparse dynamic programming I: linear cost functions
Eppstein, Journal of the ACM 39(3), 1992
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Myers, Journal of the ACM 46(3), 1999
A table-driven, full-sensitivity similarity search algorithm.
Myers G, Durbin R., J. Comput. Biol. 10(2), 2003
PMID: 12804086
Identification of common molecular subsequences.
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238
Sublinear approximate string matching and biological applications
Chang, Algorithmica 12(4-5), 1994

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 16597241
PubMed | Europe PMC

Search this title in

Google Scholar