Probabilistic Arithmetic Automata and Their Applications
Marschall T, Herms I, Kaltenbach H-M, Rahmann S (2012)
Ieee/Acm Transactions On Computational Biology And Bioinformatics 9(6): 1737-1750.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Marschall, T.;
Herms, InkeUniBi;
Kaltenbach, Hans-Michael;
Rahmann, Sven
Einrichtung
Abstract / Bemerkung
We present a comprehensive review on probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two algorithms to numerically compute the distribution of the results of such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. We present five different applications, namely 1) pattern matching statistics on random texts, including the computation of the distribution of occurrence counts, waiting times, and clump sizes under hidden Markov background models; 2) exact analysis of window-based pattern matching algorithms; 3) sensitivity of filtration seeds used to detect candidate sequence alignments; 4) length and mass statistics of peptide fragments resulting from enzymatic cleavage reactions; and 5) read length statistics of 454 and IonTorrent sequencing reads. The diversity of these applications indicates the flexibility and unifying character of the presented framework. While the construction of a PAA depends on the particular application, we single out a frequently applicable construction method: We introduce deterministic arithmetic automata (DAAs) to model deterministic calculations on sequences, and demonstrate how to construct a PAA from a given DAA and a finite-memory random text model. This procedure is used for all five discussed applications and greatly simplifies the construction of PAAs. Implementations are available as part of the MoSDi package. Its application programming interface facilitates the rapid development of new applications based on the PAA framework.
Stichworte
alignment seed;
analysis of algorithms;
string algorithm;
clump;
statistics;
matching;
pattern;
hidden Markov model;
Probabilistic automaton;
text model;
peptide mass fingerprinting;
DNA sequencing;
dynamic;
programming
Erscheinungsjahr
2012
Zeitschriftentitel
Ieee/Acm Transactions On Computational Biology And Bioinformatics
Band
9
Ausgabe
6
Seite(n)
1737-1750
ISSN
1545-5963
Page URI
https://pub.uni-bielefeld.de/record/2553484
Zitieren
Marschall T, Herms I, Kaltenbach H-M, Rahmann S. Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics. 2012;9(6):1737-1750.
Marschall, T., Herms, I., Kaltenbach, H. - M., & Rahmann, S. (2012). Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics, 9(6), 1737-1750. doi:10.1109/TCBB.2012.109
Marschall, T., Herms, Inke, Kaltenbach, Hans-Michael, and Rahmann, Sven. 2012. “Probabilistic Arithmetic Automata and Their Applications”. Ieee/Acm Transactions On Computational Biology And Bioinformatics 9 (6): 1737-1750.
Marschall, T., Herms, I., Kaltenbach, H. - M., and Rahmann, S. (2012). Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics 9, 1737-1750.
Marschall, T., et al., 2012. Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics, 9(6), p 1737-1750.
T. Marschall, et al., “Probabilistic Arithmetic Automata and Their Applications”, Ieee/Acm Transactions On Computational Biology And Bioinformatics, vol. 9, 2012, pp. 1737-1750.
Marschall, T., Herms, I., Kaltenbach, H.-M., Rahmann, S.: Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics. 9, 1737-1750 (2012).
Marschall, T., Herms, Inke, Kaltenbach, Hans-Michael, and Rahmann, Sven. “Probabilistic Arithmetic Automata and Their Applications”. Ieee/Acm Transactions On Computational Biology And Bioinformatics 9.6 (2012): 1737-1750.
Daten bereitgestellt von European Bioinformatics Institute (EBI)
2 Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.
Noé L., Algorithms Mol Biol 12(), 2017
PMID: 28289437
Noé L., Algorithms Mol Biol 12(), 2017
PMID: 28289437
Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.
Régnier M, Furletova E, Yakovlev V, Roytberg M., Algorithms Mol Biol 9(1), 2014
PMID: 25648087
Régnier M, Furletova E, Yakovlev V, Roytberg M., Algorithms Mol Biol 9(1), 2014
PMID: 25648087
60 References
Daten bereitgestellt von Europe PubMed Central.
AUTHOR UNKNOWN, 0
Efficient exact motif discovery.
Marschall T, Rahmann S., Bioinformatics 25(12), 2009
PMID: 19478010
Marschall T, Rahmann S., Bioinformatics 25(12), 2009
PMID: 19478010
AUTHOR UNKNOWN, 0
Markov Additive Chains and Applications to Fragment Statistics for Peptide Mass Fingerprinting
kaltenbach, Proc Joint Satellite Conf Systems Biology and Computational Proteomics (), 2006
kaltenbach, Proc Joint Satellite Conf Systems Biology and Computational Proteomics (), 2006
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Analysis of Boyer-Moore-Type String Searching Algorithms
baeza-yates, Proc First Ann ACM-SIAM Symp Discrete Algorithms (SODA) (), 1990
baeza-yates, Proc First Ann ACM-SIAM Symp Discrete Algorithms (SODA) (), 1990
AUTHOR UNKNOWN, 0
Efficient Experimental String Matching by Weak Factor Recognition
allauzen, Proc 12th Ann Symp Combinatorial Pattern Matching (CPM) (), 2001
allauzen, Proc 12th Ann Symp Combinatorial Pattern Matching (CPM) (), 2001
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
AUTHOR UNKNOWN, 0
PatternHunter: faster and more sensitive homology search.
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743
Ma B, Tromp J, Li M., Bioinformatics 18(3), 2002
PMID: 11934743
Designing Seeds for Similarity Search in Genomic DNA
buhler, Proc Seventh Ann Int'l Conf Research in Computational Molecular Biology (RECOMB) (), 2003
buhler, Proc Seventh Ann Int'l Conf Research in Computational Molecular Biology (RECOMB) (), 2003
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Optimal spaced seeds for homologous coding regions.
Brejova B, Brown DG, Vinar T., J Bioinform Comput Biol 1(4), 2004
PMID: 15290755
Brejova B, Brown DG, Vinar T., J Bioinform Comput Biol 1(4), 2004
PMID: 15290755
AUTHOR UNKNOWN, 0
Reorganizing the protein space at the Universal Protein Resource (UniProt).
UniProt Consortium, Apweiler R, Jesus Martin M, O'onovan C, Magrane M, Alam-Faruque Y, Antunes R, Barrera Casanova E, Bely B, Bingley M, Bower L, Bursteinas B, Mun Chan W, Chavali G, Da Silva A, Dimmer E, Eberhardt R, Fazzini F, Fedotov A, Garavelli J, Castro LG, Gardner M, Hieta R, Huntley R, Jacobsen J, Legge D, Liu W, Luo J, Orchard S, Patient S, Pichler K, Poggioli D, Pontikos N, Pundir S, Rosanoff S, Sawford T, Sehra H, Turner E, Wardell T, Watkins X, Corbett M, Donnelly M, van Rensburg P, Goujon M, McWilliam H, Lopez R, Xenarios I, Bougueleret L, Bridge A, Poux S, Redaschi N, Argoud-Puy G, Auchincloss A, Axelsen K, Baratin D, Blatter MC, Boeckmann B, Bolleman J, Bollondi L, Boutet E, Braconi Quintaje S, Breuza L, deCastro E, Cerutti L, Coudert E, Cuche B, Cusin I, Doche M, Dornevil D, Duvaud S, Estreicher A, Famiglietti L, Feuermann M, Gehant S, Ferro S, Gasteiger E, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hulo N, James J, Jimenez S, Jungo F, Kappler T, Keller G, Lara V, Lemercier P, Lieberherr D, Martin X, Masson P, Moinat M, Morgat A, Paesano S, Pedruzzi I, Pilbout S, Pozzato M, Pruess M, Rivoire C, Roechert B, Schneider M, Sigrist C, Sonesson K, Staehli S, Stanley E, Stutz A, Sundaram S, Tognolli M, Verbregue L, Veuthey AL, Wu CH, Arighi CN, Arminski L, Barker WC, Chen C, Chen Y, Dubey P, Huang H, Kukreja A, Laiho K, Mazumder R, McGarvey P, Natale DA, Natarajan TG, Roberts NV, Suzek BE, Vinayaka C, Wang Q, Wang Y, Yeh LS, Zhang J., Nucleic Acids Res. 40(Database issue), 2011
PMID: 22102590
UniProt Consortium, Apweiler R, Jesus Martin M, O'onovan C, Magrane M, Alam-Faruque Y, Antunes R, Barrera Casanova E, Bely B, Bingley M, Bower L, Bursteinas B, Mun Chan W, Chavali G, Da Silva A, Dimmer E, Eberhardt R, Fazzini F, Fedotov A, Garavelli J, Castro LG, Gardner M, Hieta R, Huntley R, Jacobsen J, Legge D, Liu W, Luo J, Orchard S, Patient S, Pichler K, Poggioli D, Pontikos N, Pundir S, Rosanoff S, Sawford T, Sehra H, Turner E, Wardell T, Watkins X, Corbett M, Donnelly M, van Rensburg P, Goujon M, McWilliam H, Lopez R, Xenarios I, Bougueleret L, Bridge A, Poux S, Redaschi N, Argoud-Puy G, Auchincloss A, Axelsen K, Baratin D, Blatter MC, Boeckmann B, Bolleman J, Bollondi L, Boutet E, Braconi Quintaje S, Breuza L, deCastro E, Cerutti L, Coudert E, Cuche B, Cusin I, Doche M, Dornevil D, Duvaud S, Estreicher A, Famiglietti L, Feuermann M, Gehant S, Ferro S, Gasteiger E, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hulo N, James J, Jimenez S, Jungo F, Kappler T, Keller G, Lara V, Lemercier P, Lieberherr D, Martin X, Masson P, Moinat M, Morgat A, Paesano S, Pedruzzi I, Pilbout S, Pozzato M, Pruess M, Rivoire C, Roechert B, Schneider M, Sigrist C, Sonesson K, Staehli S, Stanley E, Stutz A, Sundaram S, Tognolli M, Verbregue L, Veuthey AL, Wu CH, Arighi CN, Arminski L, Barker WC, Chen C, Chen Y, Dubey P, Huang H, Kukreja A, Laiho K, Mazumder R, McGarvey P, Natale DA, Natarajan TG, Roberts NV, Suzek BE, Vinayaka C, Wang Q, Wang Y, Yeh LS, Zhang J., Nucleic Acids Res. 40(Database issue), 2011
PMID: 22102590
AUTHOR UNKNOWN, 0
Multiseed lossless filtration.
Kucherov G, Noe L, Roytberg M., IEEE/ACM Trans Comput Biol Bioinform 2(1), 2005
PMID: 17044164
Kucherov G, Noe L, Roytberg M., IEEE/ACM Trans Comput Biol Bioinform 2(1), 2005
PMID: 17044164
Patternhunter II: highly sensitive and fast homology search.
Li M, Ma B, Kisman D, Tromp J., J Bioinform Comput Biol 2(3), 2004
PMID: 15359419
Li M, Ma B, Kisman D, Tromp J., J Bioinform Comput Biol 2(3), 2004
PMID: 15359419
kaltenbach, ?Statistics and Algorithms for Peptide Mass Fingerprinting ? (), 2007
Designing multiple simultaneous seeds for DNA similarity search.
Sun Y, Buhler J., J. Comput. Biol. 12(6), 2005
PMID: 16108721
Sun Y, Buhler J., J. Comput. Biol. 12(6), 2005
PMID: 16108721
Good spaced seeds for homology search.
Choi KP, Zeng F, Zhang L., Bioinformatics 20(7), 2004
PMID: 14764573
Choi KP, Zeng F, Zhang L., Bioinformatics 20(7), 2004
PMID: 14764573
Indel seeds for homology search.
Mak D, Gelfand Y, Benson G., Bioinformatics 22(14), 2006
PMID: 16873491
Mak D, Gelfand Y, Benson G., Bioinformatics 22(14), 2006
PMID: 16873491
AUTHOR UNKNOWN, 0
A unifying framework for seed sensitivity and its application to subset seeds.
Kucherov G, Noe L, Roytberg M., J Bioinform Comput Biol 4(2), 2006
PMID: 16819802
Kucherov G, Noe L, Roytberg M., J Bioinform Comput Biol 4(2), 2006
PMID: 16819802
AUTHOR UNKNOWN, 0
mohri, Handbook of Weighted Automata (), 2009
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
The PROSITE database.
Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381852
Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381852
Probabilistic and statistical properties of words: an overview.
Reinert G, Schbath S, Waterman MS., J. Comput. Biol. 7(1-2), 2000
PMID: 10890386
Reinert G, Schbath S, Waterman MS., J. Comput. Biol. 7(1-2), 2000
PMID: 10890386
br�maud, Markov Chains Gibbs Fields Monte Carlo Simulation and Queues (), 1999
AUTHOR UNKNOWN, 0
feller, An Introduction to Probability Theory and Its Applications (), 1968
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Multiple Pattern Matching: A Markov Chain Approach
lladser, J Math Biology (), 2008
lladser, J Math Biology (), 2008
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules.
Boeva V, Clement J, Regnier M, Roytberg MA, Makeev VJ., Algorithms Mol Biol 2(), 2007
PMID: 17927813
Boeva V, Clement J, Regnier M, Roytberg MA, Makeev VJ., Algorithms Mol Biol 2(), 2007
PMID: 17927813
Computing exact P-values for DNA motifs.
Zhang J, Jiang B, Li M, Tromp J, Zhang X, Zhang MQ., Bioinformatics 23(5), 2007
PMID: 17237046
Zhang J, Jiang B, Li M, Tromp J, Zhang X, Zhang MQ., Bioinformatics 23(5), 2007
PMID: 17237046
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 22868683
PubMed | Europe PMC
Suchen in