Algorithmic complexity of protein identification: combinatorics of weighted strings

Cieliebak, Mark; Erlebach, Thomas; Lipták, Zsuzsanna; Stoye, Jens; Welzl, Emo

Algorithmic complexity of protein identification: combinatorics of weighted strings

Cieliebak M, Erlebach T, Lipták Z, Stoye J, Welzl E (2004)
Discrete Applied Mathematics 137(1): 27-46.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

CIE_ERL_LIP_STO_WEL_2004.pdf

DOI

https://doi.org/10.1016/S0166-218X(03)00187-2

URN

urn:nbn:de:0070-pub-17735609

Autor*in

Cieliebak, Mark; Erlebach, Thomas; Lipták, Zsuzsanna; Stoye, Jens^UniBi ; Welzl, Emo

Einrichtung

Technische Fakultät > AG Genominformatik
Centrum für Biotechnologie > Institut für Bioinformatik
Centrum für Biotechnologie > Arbeitsgruppe J. Stoye

Abstract / Bemerkung

We investigate a problem which arises in computational biology: Given a constant-size alphabet [Mathematical script A] with a weight function µ : [Mathematical script A] --> [Double-struck N], find an efficient data structure and query algorithm solving the following problem: For a string [sigma] over [Mathematical script A] and a weight [Mathematical italic M] [element of] [Double-struck N], decide whether [sigma] contains a substring with weight [Mathematical italic M], where the weight of a string is the sum of the weights of its letters (One-String Mass Finding Problem). If the answer is yes, then we may in addition require a witness, i.e., indices i [less-than or equal to] j such that the substring beginning at position i and ending at position j has weight [Mathematical italic M]. We allow preprocessing of the string and measure efficiency in two parameters: storage space required for the preprocessed data and running time of the query algorithm for given [Mathematical italic M]. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length n of the input string [sigma]. Among others, we present two non-trivial efficient algorithms: Lookup solves the problem with O(n) storage space and O(n/log n) time; Interval solves the problem for binary alphabets with O(n) storage space in O(log n) query time. We introduce other variants of the problem and sketch how our algorithms may be extended for these variants. Finally, we discuss combinatorial properties of weighted strings.

Stichworte

Computational biology; Weighted strings; Protein identification

Erscheinungsjahr

2004

Zeitschriftentitel

Discrete Applied Mathematics

Band

137

Ausgabe

Seite(n)

27-46

ISSN

0166-218X

Page URI

https://pub.uni-bielefeld.de/record/1773560

Zitieren

Cieliebak M, Erlebach T, Lipták Z, Stoye J, Welzl E. Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics. 2004;137(1):27-46.

Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., & Welzl, E. (2004). Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics, 137(1), 27-46. https://doi.org/10.1016/S0166-218X(03)00187-2

Cieliebak, Mark, Erlebach, Thomas, Lipták, Zsuzsanna, Stoye, Jens, and Welzl, Emo. 2004. “Algorithmic complexity of protein identification: combinatorics of weighted strings”. Discrete Applied Mathematics 137 (1): 27-46.

Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., and Welzl, E. (2004). Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics 137, 27-46.

Cieliebak, M., et al., 2004. Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics, 137(1), p 27-46.

M. Cieliebak, et al., “Algorithmic complexity of protein identification: combinatorics of weighted strings”, Discrete Applied Mathematics, vol. 137, 2004, pp. 27-46.

Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics. 137, 27-46 (2004).

Cieliebak, Mark, Erlebach, Thomas, Lipták, Zsuzsanna, Stoye, Jens, and Welzl, Emo. “Algorithmic complexity of protein identification: combinatorics of weighted strings”. Discrete Applied Mathematics 137.1 (2004): 27-46.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Copyright Statement: