Probabilistic Arithmetic Automata and Their Applications

Marschall T, Herms I, Kaltenbach H-M, Rahmann S (2012)
Ieee/Acm Transactions On Computational Biology And Bioinformatics 9(6): 1737-1750.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Marschall, T.; Herms, InkeUniBi; Kaltenbach, Hans-Michael; Rahmann, Sven
Abstract / Bemerkung
We present a comprehensive review on probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two algorithms to numerically compute the distribution of the results of such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. We present five different applications, namely 1) pattern matching statistics on random texts, including the computation of the distribution of occurrence counts, waiting times, and clump sizes under hidden Markov background models; 2) exact analysis of window-based pattern matching algorithms; 3) sensitivity of filtration seeds used to detect candidate sequence alignments; 4) length and mass statistics of peptide fragments resulting from enzymatic cleavage reactions; and 5) read length statistics of 454 and IonTorrent sequencing reads. The diversity of these applications indicates the flexibility and unifying character of the presented framework. While the construction of a PAA depends on the particular application, we single out a frequently applicable construction method: We introduce deterministic arithmetic automata (DAAs) to model deterministic calculations on sequences, and demonstrate how to construct a PAA from a given DAA and a finite-memory random text model. This procedure is used for all five discussed applications and greatly simplifies the construction of PAAs. Implementations are available as part of the MoSDi package. Its application programming interface facilitates the rapid development of new applications based on the PAA framework.
Stichworte
alignment seed; analysis of algorithms; string algorithm; clump; statistics; matching; pattern; hidden Markov model; Probabilistic automaton; text model; peptide mass fingerprinting; DNA sequencing; dynamic; programming
Erscheinungsjahr
2012
Zeitschriftentitel
Ieee/Acm Transactions On Computational Biology And Bioinformatics
Band
9
Ausgabe
6
Seite(n)
1737-1750
ISSN
1545-5963
Page URI
https://pub.uni-bielefeld.de/record/2553484

Zitieren

Marschall T, Herms I, Kaltenbach H-M, Rahmann S. Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics. 2012;9(6):1737-1750.
Marschall, T., Herms, I., Kaltenbach, H. - M., & Rahmann, S. (2012). Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics, 9(6), 1737-1750. doi:10.1109/TCBB.2012.109
Marschall, T., Herms, I., Kaltenbach, H. - M., and Rahmann, S. (2012). Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics 9, 1737-1750.
Marschall, T., et al., 2012. Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics, 9(6), p 1737-1750.
T. Marschall, et al., “Probabilistic Arithmetic Automata and Their Applications”, Ieee/Acm Transactions On Computational Biology And Bioinformatics, vol. 9, 2012, pp. 1737-1750.
Marschall, T., Herms, I., Kaltenbach, H.-M., Rahmann, S.: Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics. 9, 1737-1750 (2012).
Marschall, T., Herms, Inke, Kaltenbach, Hans-Michael, and Rahmann, Sven. “Probabilistic Arithmetic Automata and Their Applications”. Ieee/Acm Transactions On Computational Biology And Bioinformatics 9.6 (2012): 1737-1750.

2 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.
Régnier M, Furletova E, Yakovlev V, Roytberg M., Algorithms Mol Biol 9(1), 2014
PMID: 25648087

60 References

Daten bereitgestellt von Europe PubMed Central.


AUTHOR UNKNOWN, 0
Efficient exact motif discovery.
Marschall T, Rahmann S., Bioinformatics 25(12), 2009
PMID: 19478010

AUTHOR UNKNOWN, 0
Markov Additive Chains and Applications to Fragment Statistics for Peptide Mass Fingerprinting
kaltenbach, Proc Joint Satellite Conf Systems Biology and Computational Proteomics (), 2006

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
Analysis of Boyer-Moore-Type String Searching Algorithms
baeza-yates, Proc First Ann ACM-SIAM Symp Discrete Algorithms (SODA) (), 1990

AUTHOR UNKNOWN, 0
Efficient Experimental String Matching by Weak Factor Recognition
allauzen, Proc 12th Ann Symp Combinatorial Pattern Matching (CPM) (), 2001

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770

AUTHOR UNKNOWN, 0
BLAT--the BLAST-like alignment tool.
Kent WJ., Genome Res. 12(4), 2002
PMID: 11932250
PatternHunter: faster and more sensitive homology search.
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743
Designing Seeds for Similarity Search in Genomic DNA
buhler, Proc Seventh Ann Int'l Conf Research in Computational Molecular Biology (RECOMB) (), 2003

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
Optimal spaced seeds for homologous coding regions.
Brejova B, Brown DG, Vinar T., J Bioinform Comput Biol 1(4), 2004
PMID: 15290755

AUTHOR UNKNOWN, 0
Reorganizing the protein space at the Universal Protein Resource (UniProt).
UniProt Consortium, Apweiler R, Jesus Martin M, O'onovan C, Magrane M, Alam-Faruque Y, Antunes R, Barrera Casanova E, Bely B, Bingley M, Bower L, Bursteinas B, Mun Chan W, Chavali G, Da Silva A, Dimmer E, Eberhardt R, Fazzini F, Fedotov A, Garavelli J, Castro LG, Gardner M, Hieta R, Huntley R, Jacobsen J, Legge D, Liu W, Luo J, Orchard S, Patient S, Pichler K, Poggioli D, Pontikos N, Pundir S, Rosanoff S, Sawford T, Sehra H, Turner E, Wardell T, Watkins X, Corbett M, Donnelly M, van Rensburg P, Goujon M, McWilliam H, Lopez R, Xenarios I, Bougueleret L, Bridge A, Poux S, Redaschi N, Argoud-Puy G, Auchincloss A, Axelsen K, Baratin D, Blatter MC, Boeckmann B, Bolleman J, Bollondi L, Boutet E, Braconi Quintaje S, Breuza L, deCastro E, Cerutti L, Coudert E, Cuche B, Cusin I, Doche M, Dornevil D, Duvaud S, Estreicher A, Famiglietti L, Feuermann M, Gehant S, Ferro S, Gasteiger E, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hulo N, James J, Jimenez S, Jungo F, Kappler T, Keller G, Lara V, Lemercier P, Lieberherr D, Martin X, Masson P, Moinat M, Morgat A, Paesano S, Pedruzzi I, Pilbout S, Pozzato M, Pruess M, Rivoire C, Roechert B, Schneider M, Sigrist C, Sonesson K, Staehli S, Stanley E, Stutz A, Sundaram S, Tognolli M, Verbregue L, Veuthey AL, Wu CH, Arighi CN, Arminski L, Barker WC, Chen C, Chen Y, Dubey P, Huang H, Kukreja A, Laiho K, Mazumder R, McGarvey P, Natale DA, Natarajan TG, Roberts NV, Suzek BE, Vinayaka C, Wang Q, Wang Y, Yeh LS, Zhang J., Nucleic Acids Res. 40(Database issue), 2011
PMID: 22102590

AUTHOR UNKNOWN, 0
Multiseed lossless filtration.
Kucherov G, Noe L, Roytberg M., IEEE/ACM Trans Comput Biol Bioinform 2(1), 2005
PMID: 17044164
Patternhunter II: highly sensitive and fast homology search.
Li M, Ma B, Kisman D, Tromp J., J Bioinform Comput Biol 2(3), 2004
PMID: 15359419

kaltenbach, ?Statistics and Algorithms for Peptide Mass Fingerprinting ? (), 2007
Designing multiple simultaneous seeds for DNA similarity search.
Sun Y, Buhler J., J. Comput. Biol. 12(6), 2005
PMID: 16108721
Good spaced seeds for homology search.
Choi KP, Zeng F, Zhang L., Bioinformatics 20(7), 2004
PMID: 14764573
Indel seeds for homology search.
Mak D, Gelfand Y, Benson G., Bioinformatics 22(14), 2006
PMID: 16873491

AUTHOR UNKNOWN, 0
A unifying framework for seed sensitivity and its application to subset seeds.
Kucherov G, Noe L, Roytberg M., J Bioinform Comput Biol 4(2), 2006
PMID: 16819802

AUTHOR UNKNOWN, 0

mohri, Handbook of Weighted Automata (), 2009

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
The PROSITE database.
Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381852
Probabilistic and statistical properties of words: an overview.
Reinert G, Schbath S, Waterman MS., J. Comput. Biol. 7(1-2), 2000
PMID: 10890386

br�maud, Markov Chains Gibbs Fields Monte Carlo Simulation and Queues (), 1999

AUTHOR UNKNOWN, 0

feller, An Introduction to Probability Theory and Its Applications (), 1968

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
Multiple Pattern Matching: A Markov Chain Approach
lladser, J Math Biology (), 2008

AUTHOR UNKNOWN, 0
Statistical distributions of pyrosequencing.
Kong Y., J. Comput. Biol. 16(1), 2009
PMID: 19072582

AUTHOR UNKNOWN, 0
Computing exact P-values for DNA motifs.
Zhang J, Jiang B, Li M, Tromp J, Zhang X, Zhang MQ., Bioinformatics 23(5), 2007
PMID: 17237046

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 22868683
PubMed | Europe PMC

Suchen in

Google Scholar