TY - CONF
AB - We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Sigma, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s is an element of Sigma(n) and a nonnegative integer k <= n, how many distinct subsequences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of rho-restricted rho-generated sequences: Given s is an element of Sigma(n) and integers k >= 1 and rho >= 1, how many distinct sequences in Sigma(k) contain no single nucleotide repeat longer than rho and can be written as s(1)(r1)... s(n)(rn) with 0 <= r(i) <= rho for all i? For rho = infinity, the question becomes how many length-k sequences match the regular expression s(1)*s(2)*... s(n)*. These considerations allow a detailed analysis of a new DNA sequencing technology ("454 sequencing"). (3) Exact length distribution of the longest increasing subsequence: Given Sigma = {1, ..., K} and an integer n >= 1, determine the number of sequences in Sigma(n) whose longest strictly increasing subsequence has length k, where 0 <= k <= K. This has applications to significance computations for chaining algorithms.
AU - Rahmann, Sven
ED - Lewenstein, Moshe
ED - Valiente, Gabriel
ID - 1598249
SN - 978-3-540-35455-0
T2 - Combinatorial Pattern Matching. 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006. Proceedings
TI - Subsequence combinatorics and applications to microarray production, DNA sequencing and chaining algorithms
VL - 4009
ER -