Suffix arrays in theory and practice

Schürmann K-B (2007)
Bielefeld (Germany): Bielefeld University.

Bielefelder E-Dissertation | Englisch
 
Download
OA
Autor*in
Schürmann, Klaus-Bernd
Gutachter*in / Betreuer*in
Abstract / Bemerkung
The suffix array of a string is a permutation of all starting positions of the string's suffixes in lexicographical order. In this thesis, we investigate mathematical and algorithmical aspects of suffix arrays. The first part mainly deals with combinatorial properties of suffix arrays and their enumeration. For a fixed alphabet size and string length, we divide the set of all strings into equivalence classes of strings that share the same suffix array. For each such equivalence class, we count the number of strings contained in it and enumerate those strings. We also give exact formulas for computing the number of equivalence classes and efficient algorithms for enumerating them. Alternatively, we count the number of suffix arrays and enumerate them. Our methods yield lower bounds for the compressibility of suffix arrays and build the foundation for the efficient generation of appropriate test data sets for suffix-array-based algorithms. We also show that summing up the elements of all equivalence classes forms a particular instance for some summation identities of Eulerian numbers. The second part of the thesis deals with suffix array construction. We first present a new classification of suffix array construction algorithms and provide an in-depth review of the classified algorithms. We classify the algorithms regarding two different categories: the progress in the suffix sorting process and the usage of dependencies among suffixes. After the survey of the previous algorithms, we present our new practical algorithm for suffix array construction that consists of two easy-to-implement components. It first sorts the suffixes with respect to a fixed length prefix; then it refines each bucket of suffixes sharing the same prefix using the order of already sorted suffixes. Other suffix array construction algorithms follow more complex strategies. We achieve a very fast construction for common strings as well as for worst-case strings by enhancing our algorithm with further techniques; this is shown by an in-depth experimental study that compares our algorithm to other fast suffix array construction algorithms.
Stichworte
Suffix Array; Permutation; Kombinatorik; Zeichenkette; Abzählen; Theoretische Informatik; Volltextindex; Full-text index; String
Jahr
2007
Page URI
https://pub.uni-bielefeld.de/record/2303359

Zitieren

Schürmann K-B. Suffix arrays in theory and practice. Bielefeld (Germany): Bielefeld University; 2007.
Schürmann, K. - B. (2007). Suffix arrays in theory and practice. Bielefeld (Germany): Bielefeld University.
Schürmann, Klaus-Bernd. 2007. Suffix arrays in theory and practice. Bielefeld (Germany): Bielefeld University.
Schürmann, K. - B. (2007). Suffix arrays in theory and practice. Bielefeld (Germany): Bielefeld University.
Schürmann, K.-B., 2007. Suffix arrays in theory and practice, Bielefeld (Germany): Bielefeld University.
K.-B. Schürmann, Suffix arrays in theory and practice, Bielefeld (Germany): Bielefeld University, 2007.
Schürmann, K.-B.: Suffix arrays in theory and practice. Bielefeld University, Bielefeld (Germany) (2007).
Schürmann, Klaus-Bernd. Suffix arrays in theory and practice. Bielefeld (Germany): Bielefeld University, 2007.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-25T06:18:13Z
MD5 Prüfsumme
b43531e70109842f949f2d351aef3e1c


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar