On Approximate Jumbled Pattern Matching in Strings

Burcsi P, Cicalese F, Fici G, Lipták Z (2012)
Theory of Computing Systems 50(1): 35-51.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Burcsi, Peter; Cicalese, Ferdinando; Fici, Gabriele; Lipták, ZsuzsannaUniBi
Abstract / Bemerkung
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u, v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u <= q <= v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then discuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t) <= v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.
Stichworte
Permuted strings; Approximate search; analysis; Average case; Parikh vectors; String algorithms; Pattern matching
Erscheinungsjahr
2012
Zeitschriftentitel
Theory of Computing Systems
Band
50
Ausgabe
1
Seite(n)
35-51
ISSN
1432-4350
eISSN
1433-0490
Page URI
https://pub.uni-bielefeld.de/record/2474531

Zitieren

Burcsi P, Cicalese F, Fici G, Lipták Z. On Approximate Jumbled Pattern Matching in Strings. Theory of Computing Systems. 2012;50(1):35-51.
Burcsi, P., Cicalese, F., Fici, G., & Lipták, Z. (2012). On Approximate Jumbled Pattern Matching in Strings. Theory of Computing Systems, 50(1), 35-51. doi:10.1007/s00224-011-9344-5
Burcsi, P., Cicalese, F., Fici, G., and Lipták, Z. (2012). On Approximate Jumbled Pattern Matching in Strings. Theory of Computing Systems 50, 35-51.
Burcsi, P., et al., 2012. On Approximate Jumbled Pattern Matching in Strings. Theory of Computing Systems, 50(1), p 35-51.
P. Burcsi, et al., “On Approximate Jumbled Pattern Matching in Strings”, Theory of Computing Systems, vol. 50, 2012, pp. 35-51.
Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: On Approximate Jumbled Pattern Matching in Strings. Theory of Computing Systems. 50, 35-51 (2012).
Burcsi, Peter, Cicalese, Ferdinando, Fici, Gabriele, and Lipták, Zsuzsanna. “On Approximate Jumbled Pattern Matching in Strings”. Theory of Computing Systems 50.1 (2012): 35-51.