Efficient q-gram filters for finding all epsilon-matches over a given length

Rasmussen KR, Stoye J, Myers EW (2006)
JOURNAL OF COMPUTATIONAL BIOLOGY 13(2): 296-308.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Rasmussen, Kim R.; Stoye, JensUniBi ; Myers, Eugene W.
Abstract / Bemerkung
Fast and exact comparison of large genomic sequences remains a challenging task in biosequence analysis. We consider the problem of finding all epsilon-matches between two sequences, i.e., all local alignments over a given length with an error rate of at most epsilon. We study this problem theoretically, giving an efficient q-gram filter for solving it. Two applications of the filter are also discussed, in particular genomic sequence assembly and BLAST-like sequence comparison. Our results show that the method is 25 times faster than BLAST, while not being heuristic.
Stichworte
local alignment searching; q-grams; filter; clustering; EST; sequence assembly
Erscheinungsjahr
2006
Zeitschriftentitel
JOURNAL OF COMPUTATIONAL BIOLOGY
Band
13
Ausgabe
2
Seite(n)
296-308
ISSN
1066-5277
eISSN
1557-8666
Page URI
https://pub.uni-bielefeld.de/record/1599570

Zitieren

Rasmussen KR, Stoye J, Myers EW. Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY. 2006;13(2):296-308.
Rasmussen, K. R., Stoye, J., & Myers, E. W. (2006). Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY, 13(2), 296-308. https://doi.org/10.1089/cmb.2006.13.296
Rasmussen, Kim R., Stoye, Jens, and Myers, Eugene W. 2006. “Efficient q-gram filters for finding all epsilon-matches over a given length”. JOURNAL OF COMPUTATIONAL BIOLOGY 13 (2): 296-308.
Rasmussen, K. R., Stoye, J., and Myers, E. W. (2006). Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY 13, 296-308.
Rasmussen, K.R., Stoye, J., & Myers, E.W., 2006. Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY, 13(2), p 296-308.
K.R. Rasmussen, J. Stoye, and E.W. Myers, “Efficient q-gram filters for finding all epsilon-matches over a given length”, JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 13, 2006, pp. 296-308.
Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. JOURNAL OF COMPUTATIONAL BIOLOGY. 13, 296-308 (2006).
Rasmussen, Kim R., Stoye, Jens, and Myers, Eugene W. “Efficient q-gram filters for finding all epsilon-matches over a given length”. JOURNAL OF COMPUTATIONAL BIOLOGY 13.2 (2006): 296-308.

38 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Systematic comparative study of computational methods for T-cell receptor sequencing data analysis.
Afzal S, Gil-Farina I, Gabriel R, Ahmad S, von Kalle C, Schmidt M, Fronza R., Brief Bioinform 20(1), 2019
PMID: 29028876
ImtRDB: a database and software for mitochondrial imperfect interspersed repeats annotation.
Shamanskiy VA, Timonina VN, Popadin KY, Gunbin KV., BMC Genomics 20(suppl 3), 2019
PMID: 31284879
Linking temporal medical records using non-protected health information data.
Bonomi L, Jiang X., Stat Methods Med Res 27(11), 2018
PMID: 29298592
Short Read Mapping: An Algorithmic Tour.
Canzar S, Salzberg SL., Proc IEEE Inst Electr Electron Eng 105(3), 2017
PMID: 28502990
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Alser M, Hassan H, Xin H, Ergin O, Mutlu O, Alkan C., Bioinformatics 33(21), 2017
PMID: 28575161
PopIns: population-scale detection of novel sequence insertions.
Kehr B, Melsted P, Halldórsson BV., Bioinformatics 32(7), 2016
PMID: 25926346
Optimal seed solver: optimizing seed selection in read mapping.
Xin H, Nahar S, Zhu R, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O., Bioinformatics 32(11), 2016
PMID: 26568624
rHAT: fast alignment of noisy long reads with regional hashing.
Liu B, Guan D, Teng M, Wang Y., Bioinformatics 32(11), 2016
PMID: 26568628
Fast search of thousands of short-read sequencing experiments.
Solomon B, Kingsford C., Nat Biotechnol 34(3), 2016
PMID: 26854477
Circular sequence comparison: algorithms and applications.
Grossi R, Iliopoulos CS, Mercas R, Pisanti N, Pissis SP, Retha A, Vayani F., Algorithms Mol Biol 11(), 2016
PMID: 27168761
Alignment of Next-Generation Sequencing Reads.
Reinert K, Langmead B, Weese D, Evers DJ., Annu Rev Genomics Hum Genet 16(), 2015
PMID: 25939052
IMSEQ--a fast and error aware approach to immunogenetic sequence analysis.
Kuchenbecker L, Nienen M, Hecht J, Neumann AU, Babel N, Reinert K, Robinson PN., Bioinformatics 31(18), 2015
PMID: 25987567
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM., Nat Biotechnol 33(6), 2015
PMID: 26006009
BitMapper: an efficient all-mapper based on bit-vector computing.
Cheng H, Jiang H, Yang J, Xu Y, Shang Y., BMC Bioinformatics 16(), 2015
PMID: 26063651
Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone.
Trappe K, Emde AK, Ehrlich HC, Reinert K., Bioinformatics 30(24), 2014
PMID: 25028727
Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine.
Hong H, Zhang W, Shen J, Su Z, Ning B, Han T, Perkins R, Shi L, Tong W., Sci China Life Sci 56(2), 2013
PMID: 23393026
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.
Liao Y, Smyth GK, Shi W., Nucleic Acids Res 41(10), 2013
PMID: 23558742
Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.
Cordero F, Beccuti M, Arigoni M, Donatelli S, Calogero RA., PLoS One 7(2), 2012
PMID: 22363693
RazerS 3: faster, fully sensitive read mapping.
Weese D, Holtgrewe M, Reinert K., Bioinformatics 28(20), 2012
PMID: 22923295
Long read alignment based on maximal exact match seeds.
Liu Y, Schmidt B., Bioinformatics 28(18), 2012
PMID: 22962447
Normalized N50 assembly metric using gap-restricted co-linear chaining.
Mäkinen V, Salmela L, Ylinen J., BMC Bioinformatics 13(), 2012
PMID: 23031320
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.
Misra S, Agrawal A, Liao WK, Choudhary A., Bioinformatics 27(2), 2011
PMID: 21088030
SHRiMP2: sensitive yet practical SHort Read Mapping.
David M, Dzamba M, Lister D, Ilie L, Brudno M., Bioinformatics 27(7), 2011
PMID: 21278192
Considering transposable element diversification in de novo annotation approaches.
Flutre T, Duprat E, Feuillet C, Quesneville H., PLoS One 6(1), 2011
PMID: 21304975
STELLAR: fast and exact local alignments.
Kehr B, Weese D, Reinert K., BMC Bioinformatics 12 Suppl 9(), 2011
PMID: 22151882
MicroRazerS: rapid alignment of small RNA reads.
Emde AK, Grunert M, Weese D, Reinert K, Sperling SR., Bioinformatics 26(1), 2010
PMID: 19880369
Phylogenetic comparative assembly.
Husemann P, Stoye J., Algorithms Mol Biol 5(), 2010
PMID: 20047659
Lossless filter for multiple repeats with bounded edit distance.
Peterlongo P, Sacomoto GA, do Lago AP, Pisanti N, Sagot MF., Algorithms Mol Biol 4(), 2009
PMID: 19183438
Massive parallel bisulfite sequencing of CG-rich DNA fragments reveals that methylation of many X-chromosomal CpG islands in female blood DNA is incomplete.
Zeschnigk M, Martin M, Betzl G, Kalbe A, Sirsch C, Buiting K, Gross S, Fritzilas E, Frey B, Rahmann S, Horsthemke B., Hum Mol Genet 18(8), 2009
PMID: 19223391
A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads.
Rausch T, Koren S, Denisov G, Weese D, Emde AK, Döring A, Reinert K., Bioinformatics 25(9), 2009
PMID: 19269990
RazerS--fast read mapping with sensitivity control.
Weese D, Emde AK, Rausch T, Döring A, Reinert K., Genome Res 19(9), 2009
PMID: 19592482
Sense from sequence reads: methods for alignment and assembly.
Flicek P, Birney E., Nat Methods 6(11 suppl), 2009
PMID: 19844229
Locked nucleic acid-based in situ detection of microRNAs in mouse tissue sections.
Obernosterer G, Martinez J, Alenius M., Nat Protoc 2(6), 2007
PMID: 17571058
PILER-CR: fast and accurate identification of CRISPR repeats.
Edgar RC., BMC Bioinformatics 8(), 2007
PMID: 17239253

14 References

Daten bereitgestellt von Europe PubMed Central.

Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
BLAT--the BLAST-like alignment tool.
Kent WJ., Genome Res. 12(4), 2002
PMID: 11932250
PatternHunter: faster and more sensitive homology search.
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
A table-driven, full-sensitivity similarity search algorithm.
Myers G, Durbin R., J. Comput. Biol. 10(2), 2003
PMID: 12804086
SSAHA: a fast search method for large DNA databases.
Ning Z, Cox AJ, Mullikin JC., Genome Res. 11(10), 2001
PMID: 11591649
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
Identification of common molecular subsequences.
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238

AUTHOR UNKNOWN, 0
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®
Quellen

PMID: 16597241
PubMed | Europe PMC

Suchen in

Google Scholar