Efficient q-gram filters for finding all epsilon-matches over a given length
Rasmussen KR, Stoye J, Myers EW (2006)
JOURNAL OF COMPUTATIONAL BIOLOGY 13(2): 296-308.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Rasmussen, Kim R.;
Stoye, JensUniBi ;
Myers, Eugene W.
Einrichtung
Abstract / Bemerkung
Fast and exact comparison of large genomic sequences remains a challenging task in biosequence analysis. We consider the problem of finding all epsilon-matches between two sequences, i.e., all local alignments over a given length with an error rate of at most epsilon. We study this problem theoretically, giving an efficient q-gram filter for solving it. Two applications of the filter are also discussed, in particular genomic sequence assembly and BLAST-like sequence comparison. Our results show that the method is 25 times faster than BLAST, while not being heuristic.
Stichworte
local alignment searching;
q-grams;
filter;
clustering;
EST;
sequence assembly
Erscheinungsjahr
2006
Zeitschriftentitel
JOURNAL OF COMPUTATIONAL BIOLOGY
Band
13
Ausgabe
2
Seite(n)
296-308
ISSN
1066-5277
eISSN
1557-8666
Page URI
https://pub.uni-bielefeld.de/record/1599570
Zitieren
Rasmussen KR, Stoye J, Myers EW. Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY. 2006;13(2):296-308.
Rasmussen, K. R., Stoye, J., & Myers, E. W. (2006). Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY, 13(2), 296-308. https://doi.org/10.1089/cmb.2006.13.296
Rasmussen, Kim R., Stoye, Jens, and Myers, Eugene W. 2006. “Efficient q-gram filters for finding all epsilon-matches over a given length”. JOURNAL OF COMPUTATIONAL BIOLOGY 13 (2): 296-308.
Rasmussen, K. R., Stoye, J., and Myers, E. W. (2006). Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY 13, 296-308.
Rasmussen, K.R., Stoye, J., & Myers, E.W., 2006. Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY, 13(2), p 296-308.
K.R. Rasmussen, J. Stoye, and E.W. Myers, “Efficient q-gram filters for finding all epsilon-matches over a given length”, JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 13, 2006, pp. 296-308.
Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY. 13, 296-308 (2006).
Rasmussen, Kim R., Stoye, Jens, and Myers, Eugene W. “Efficient q-gram filters for finding all epsilon-matches over a given length”. JOURNAL OF COMPUTATIONAL BIOLOGY 13.2 (2006): 296-308.
Daten bereitgestellt von European Bioinformatics Institute (EBI)
38 Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
Systematic comparative study of computational methods for T-cell receptor sequencing data analysis.
Afzal S, Gil-Farina I, Gabriel R, Ahmad S, von Kalle C, Schmidt M, Fronza R., Brief Bioinform 20(1), 2019
PMID: 29028876
Afzal S, Gil-Farina I, Gabriel R, Ahmad S, von Kalle C, Schmidt M, Fronza R., Brief Bioinform 20(1), 2019
PMID: 29028876
ImtRDB: a database and software for mitochondrial imperfect interspersed repeats annotation.
Shamanskiy VA, Timonina VN, Popadin KY, Gunbin KV., BMC Genomics 20(suppl 3), 2019
PMID: 31284879
Shamanskiy VA, Timonina VN, Popadin KY, Gunbin KV., BMC Genomics 20(suppl 3), 2019
PMID: 31284879
Linking temporal medical records using non-protected health information data.
Bonomi L, Jiang X., Stat Methods Med Res 27(11), 2018
PMID: 29298592
Bonomi L, Jiang X., Stat Methods Med Res 27(11), 2018
PMID: 29298592
Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.
Solomon B, Kingsford C., J Comput Biol 25(7), 2018
PMID: 29641248
Solomon B, Kingsford C., J Comput Biol 25(7), 2018
PMID: 29641248
Short Read Mapping: An Algorithmic Tour.
Canzar S, Salzberg SL., Proc IEEE Inst Electr Electron Eng 105(3), 2017
PMID: 28502990
Canzar S, Salzberg SL., Proc IEEE Inst Electr Electron Eng 105(3), 2017
PMID: 28502990
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Alser M, Hassan H, Xin H, Ergin O, Mutlu O, Alkan C., Bioinformatics 33(21), 2017
PMID: 28575161
Alser M, Hassan H, Xin H, Ergin O, Mutlu O, Alkan C., Bioinformatics 33(21), 2017
PMID: 28575161
PopIns: population-scale detection of novel sequence insertions.
Kehr B, Melsted P, Halldórsson BV., Bioinformatics 32(7), 2016
PMID: 25926346
Kehr B, Melsted P, Halldórsson BV., Bioinformatics 32(7), 2016
PMID: 25926346
Optimal seed solver: optimizing seed selection in read mapping.
Xin H, Nahar S, Zhu R, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O., Bioinformatics 32(11), 2016
PMID: 26568624
Xin H, Nahar S, Zhu R, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O., Bioinformatics 32(11), 2016
PMID: 26568624
rHAT: fast alignment of noisy long reads with regional hashing.
Liu B, Guan D, Teng M, Wang Y., Bioinformatics 32(11), 2016
PMID: 26568628
Liu B, Guan D, Teng M, Wang Y., Bioinformatics 32(11), 2016
PMID: 26568628
Fast search of thousands of short-read sequencing experiments.
Solomon B, Kingsford C., Nat Biotechnol 34(3), 2016
PMID: 26854477
Solomon B, Kingsford C., Nat Biotechnol 34(3), 2016
PMID: 26854477
Circular sequence comparison: algorithms and applications.
Grossi R, Iliopoulos CS, Mercas R, Pisanti N, Pissis SP, Retha A, Vayani F., Algorithms Mol Biol 11(), 2016
PMID: 27168761
Grossi R, Iliopoulos CS, Mercas R, Pisanti N, Pissis SP, Retha A, Vayani F., Algorithms Mol Biol 11(), 2016
PMID: 27168761
Alignment of Next-Generation Sequencing Reads.
Reinert K, Langmead B, Weese D, Evers DJ., Annu Rev Genomics Hum Genet 16(), 2015
PMID: 25939052
Reinert K, Langmead B, Weese D, Evers DJ., Annu Rev Genomics Hum Genet 16(), 2015
PMID: 25939052
IMSEQ--a fast and error aware approach to immunogenetic sequence analysis.
Kuchenbecker L, Nienen M, Hecht J, Neumann AU, Babel N, Reinert K, Robinson PN., Bioinformatics 31(18), 2015
PMID: 25987567
Kuchenbecker L, Nienen M, Hecht J, Neumann AU, Babel N, Reinert K, Robinson PN., Bioinformatics 31(18), 2015
PMID: 25987567
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM., Nat Biotechnol 33(6), 2015
PMID: 26006009
Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM., Nat Biotechnol 33(6), 2015
PMID: 26006009
BitMapper: an efficient all-mapper based on bit-vector computing.
Cheng H, Jiang H, Yang J, Xu Y, Shang Y., BMC Bioinformatics 16(), 2015
PMID: 26063651
Cheng H, Jiang H, Yang J, Xu Y, Shang Y., BMC Bioinformatics 16(), 2015
PMID: 26063651
Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone.
Trappe K, Emde AK, Ehrlich HC, Reinert K., Bioinformatics 30(24), 2014
PMID: 25028727
Trappe K, Emde AK, Ehrlich HC, Reinert K., Bioinformatics 30(24), 2014
PMID: 25028727
Massively parallel read mapping on GPUs with the q-group index and PEANUT.
Köster J, Rahmann S., PeerJ 2(), 2014
PMID: 25289191
Köster J, Rahmann S., PeerJ 2(), 2014
PMID: 25289191
A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.
Noé L, Noé L, Martin DE., J Comput Biol 21(12), 2014
PMID: 25393923
Noé L, Noé L, Martin DE., J Comput Biol 21(12), 2014
PMID: 25393923
Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine.
Hong H, Zhang W, Shen J, Su Z, Ning B, Han T, Perkins R, Shi L, Tong W., Sci China Life Sci 56(2), 2013
PMID: 23393026
Hong H, Zhang W, Shen J, Su Z, Ning B, Han T, Perkins R, Shi L, Tong W., Sci China Life Sci 56(2), 2013
PMID: 23393026
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.
Liao Y, Smyth GK, Shi W., Nucleic Acids Res 41(10), 2013
PMID: 23558742
Liao Y, Smyth GK, Shi W., Nucleic Acids Res 41(10), 2013
PMID: 23558742
Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.
Cordero F, Beccuti M, Arigoni M, Donatelli S, Calogero RA., PLoS One 7(2), 2012
PMID: 22363693
Cordero F, Beccuti M, Arigoni M, Donatelli S, Calogero RA., PLoS One 7(2), 2012
PMID: 22363693
RazerS 3: faster, fully sensitive read mapping.
Weese D, Holtgrewe M, Reinert K., Bioinformatics 28(20), 2012
PMID: 22923295
Weese D, Holtgrewe M, Reinert K., Bioinformatics 28(20), 2012
PMID: 22923295
Long read alignment based on maximal exact match seeds.
Liu Y, Schmidt B., Bioinformatics 28(18), 2012
PMID: 22962447
Liu Y, Schmidt B., Bioinformatics 28(18), 2012
PMID: 22962447
Normalized N50 assembly metric using gap-restricted co-linear chaining.
Mäkinen V, Salmela L, Ylinen J., BMC Bioinformatics 13(), 2012
PMID: 23031320
Mäkinen V, Salmela L, Ylinen J., BMC Bioinformatics 13(), 2012
PMID: 23031320
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.
Misra S, Agrawal A, Liao WK, Choudhary A., Bioinformatics 27(2), 2011
PMID: 21088030
Misra S, Agrawal A, Liao WK, Choudhary A., Bioinformatics 27(2), 2011
PMID: 21088030
SHRiMP2: sensitive yet practical SHort Read Mapping.
David M, Dzamba M, Lister D, Ilie L, Brudno M., Bioinformatics 27(7), 2011
PMID: 21278192
David M, Dzamba M, Lister D, Ilie L, Brudno M., Bioinformatics 27(7), 2011
PMID: 21278192
Considering transposable element diversification in de novo annotation approaches.
Flutre T, Duprat E, Feuillet C, Quesneville H., PLoS One 6(1), 2011
PMID: 21304975
Flutre T, Duprat E, Feuillet C, Quesneville H., PLoS One 6(1), 2011
PMID: 21304975
STELLAR: fast and exact local alignments.
Kehr B, Weese D, Reinert K., BMC Bioinformatics 12 Suppl 9(), 2011
PMID: 22151882
Kehr B, Weese D, Reinert K., BMC Bioinformatics 12 Suppl 9(), 2011
PMID: 22151882
MicroRazerS: rapid alignment of small RNA reads.
Emde AK, Grunert M, Weese D, Reinert K, Sperling SR., Bioinformatics 26(1), 2010
PMID: 19880369
Emde AK, Grunert M, Weese D, Reinert K, Sperling SR., Bioinformatics 26(1), 2010
PMID: 19880369
Phylogenetic comparative assembly.
Husemann P, Stoye J., Algorithms Mol Biol 5(), 2010
PMID: 20047659
Husemann P, Stoye J., Algorithms Mol Biol 5(), 2010
PMID: 20047659
Fast and SNP-tolerant detection of complex variants and splicing in short reads.
Wu TD, Nacu S., Bioinformatics 26(7), 2010
PMID: 20147302
Wu TD, Nacu S., Bioinformatics 26(7), 2010
PMID: 20147302
Lossless filter for multiple repeats with bounded edit distance.
Peterlongo P, Sacomoto GA, do Lago AP, Pisanti N, Sagot MF., Algorithms Mol Biol 4(), 2009
PMID: 19183438
Peterlongo P, Sacomoto GA, do Lago AP, Pisanti N, Sagot MF., Algorithms Mol Biol 4(), 2009
PMID: 19183438
Massive parallel bisulfite sequencing of CG-rich DNA fragments reveals that methylation of many X-chromosomal CpG islands in female blood DNA is incomplete.
Zeschnigk M, Martin M, Betzl G, Kalbe A, Sirsch C, Buiting K, Gross S, Fritzilas E, Frey B, Rahmann S, Horsthemke B., Hum Mol Genet 18(8), 2009
PMID: 19223391
Zeschnigk M, Martin M, Betzl G, Kalbe A, Sirsch C, Buiting K, Gross S, Fritzilas E, Frey B, Rahmann S, Horsthemke B., Hum Mol Genet 18(8), 2009
PMID: 19223391
A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads.
Rausch T, Koren S, Denisov G, Weese D, Emde AK, Döring A, Reinert K., Bioinformatics 25(9), 2009
PMID: 19269990
Rausch T, Koren S, Denisov G, Weese D, Emde AK, Döring A, Reinert K., Bioinformatics 25(9), 2009
PMID: 19269990
RazerS--fast read mapping with sensitivity control.
Weese D, Emde AK, Rausch T, Döring A, Reinert K., Genome Res 19(9), 2009
PMID: 19592482
Weese D, Emde AK, Rausch T, Döring A, Reinert K., Genome Res 19(9), 2009
PMID: 19592482
Sense from sequence reads: methods for alignment and assembly.
Flicek P, Birney E., Nat Methods 6(11 suppl), 2009
PMID: 19844229
Flicek P, Birney E., Nat Methods 6(11 suppl), 2009
PMID: 19844229
Locked nucleic acid-based in situ detection of microRNAs in mouse tissue sections.
Obernosterer G, Martinez J, Alenius M., Nat Protoc 2(6), 2007
PMID: 17571058
Obernosterer G, Martinez J, Alenius M., Nat Protoc 2(6), 2007
PMID: 17571058
PILER-CR: fast and accurate identification of CRISPR repeats.
Edgar RC., BMC Bioinformatics 8(), 2007
PMID: 17239253
Edgar RC., BMC Bioinformatics 8(), 2007
PMID: 17239253
14 References
Daten bereitgestellt von Europe PubMed Central.
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
PatternHunter: faster and more sensitive homology search.
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
A table-driven, full-sensitivity similarity search algorithm.
Myers G, Durbin R., J. Comput. Biol. 10(2), 2003
PMID: 12804086
Myers G, Durbin R., J. Comput. Biol. 10(2), 2003
PMID: 12804086
SSAHA: a fast search method for large DNA databases.
Ning Z, Cox AJ, Mullikin JC., Genome Res. 11(10), 2001
PMID: 11591649
Ning Z, Cox AJ, Mullikin JC., Genome Res. 11(10), 2001
PMID: 11591649
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.
Pearson WR., Genomics 11(3), 1991
PMID: 1774068
Pearson WR., Genomics 11(3), 1991
PMID: 1774068
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
Identification of common molecular subsequences.
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238
AUTHOR UNKNOWN, 0
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 16597241
PubMed | Europe PMC
Suchen in