10.1007/11780441_15 Rahmann, Sven Sven Rahmann Lewenstein, Moshe Moshe Lewenstein Valiente, Gabriel Gabriel Valiente Subsequence combinatorics and applications to microarray production, DNA sequencing and chaining algorithms Springer 2006 2010-04-28T12:19:18Z 2019-03-13T14:57:18Z conference https://pub.uni-bielefeld.de/record/1598249 https://pub.uni-bielefeld.de/record/1598249.json We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Sigma, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s is an element of Sigma(n) and a nonnegative integer k <= n, how many distinct subsequences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of rho-restricted rho-generated sequences: Given s is an element of Sigma(n) and integers k >= 1 and rho >= 1, how many distinct sequences in Sigma(k) contain no single nucleotide repeat longer than rho and can be written as s(1)(r1)... s(n)(rn) with 0 <= r(i) <= rho for all i? For rho = infinity, the question becomes how many length-k sequences match the regular expression s(1)*s(2)*... s(n)*. These considerations allow a detailed analysis of a new DNA sequencing technology ("454 sequencing"). (3) Exact length distribution of the longest increasing subsequence: Given Sigma = {1, ..., K} and an integer n >= 1, determine the number of sequences in Sigma(n) whose longest strictly increasing subsequence has length k, where 0 <= k <= K. This has applications to significance computations for chaining algorithms.