Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
Dörr D, Gronau I, Moran S, Yavneh I (2012)
Algorithms for Molecular Biology 7(1): 22.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Autor*in
Dörr, DanielUniBi ;
Gronau, Ilan;
Moran, Shlomo;
Yavneh, Irad
Einrichtung
Abstract / Bemerkung
Background: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. Results: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. Conclusions: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.
Stichworte
rate functions;
Substitution models;
Phylogenetic reconstructions;
Additive substitution
Erscheinungsjahr
2012
Zeitschriftentitel
Algorithms for Molecular Biology
Band
7
Ausgabe
1
Art.-Nr.
22
ISSN
1748-7188
Finanzierungs-Informationen
Open-Access-Publikationskosten wurden durch die Deutsche Forschungsgemeinschaft und die Universität Bielefeld gefördert.
Page URI
https://pub.uni-bielefeld.de/record/2560353
Zitieren
Dörr D, Gronau I, Moran S, Yavneh I. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 2012;7(1): 22.
Dörr, D., Gronau, I., Moran, S., & Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology, 7(1), 22. doi:10.1186/1748-7188-7-22
Dörr, Daniel, Gronau, Ilan, Moran, Shlomo, and Yavneh, Irad. 2012. “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”. Algorithms for Molecular Biology 7 (1): 22.
Dörr, D., Gronau, I., Moran, S., and Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology 7:22.
Dörr, D., et al., 2012. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology, 7(1): 22.
D. Dörr, et al., “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”, Algorithms for Molecular Biology, vol. 7, 2012, : 22.
Dörr, D., Gronau, I., Moran, S., Yavneh, I.: Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 7, : 22 (2012).
Dörr, Daniel, Gronau, Ilan, Moran, Shlomo, and Yavneh, Irad. “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”. Algorithms for Molecular Biology 7.1 (2012): 22.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:11Z
MD5 Prüfsumme
d7e3a4b4a6d756e63258e209d24c1e9e
Daten bereitgestellt von European Bioinformatics Institute (EBI)
1 Zitation in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti.
Copetti D, Búrquez A, Bustamante E, Charboneau JLM, Childs KL, Eguiarte LE, Lee S, Liu TL, McMahon MM, Whiteman NK, Wing RA, Wojciechowski MF, Sanderson MJ., Proc Natl Acad Sci U S A 114(45), 2017
PMID: 29078296
Copetti D, Búrquez A, Bustamante E, Charboneau JLM, Childs KL, Eguiarte LE, Lee S, Liu TL, McMahon MM, Whiteman NK, Wing RA, Wojciechowski MF, Sanderson MJ., Proc Natl Acad Sci U S A 114(45), 2017
PMID: 29078296
47 References
Daten bereitgestellt von Europe PubMed Central.
AUTHOR UNKNOWN, 2004
AUTHOR UNKNOWN, 2003
AUTHOR UNKNOWN, 2002
Evolution of Protein Molecules
AUTHOR UNKNOWN, 1969
AUTHOR UNKNOWN, 1969
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.
Kimura M., J. Mol. Evol. 16(2), 1980
PMID: 7463489
Kimura M., J. Mol. Evol. 16(2), 1980
PMID: 7463489
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.
Hasegawa M, Kishino H, Yano T., J. Mol. Evol. 22(2), 1985
PMID: 3934395
Hasegawa M, Kishino H, Yano T., J. Mol. Evol. 22(2), 1985
PMID: 3934395
Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences
AUTHOR UNKNOWN, 1986
AUTHOR UNKNOWN, 1986
A new method for calculating evolutionary substitution rates.
Lanave C, Preparata G, Saccone C, Serio G., J. Mol. Evol. 20(1), 1984
PMID: 6429346
Lanave C, Preparata G, Saccone C, Serio G., J. Mol. Evol. 20(1), 1984
PMID: 6429346
Towards optimal distance functions for stochastic substitution models.
Gronau I, Moran S, Yavneh I., J. Theor. Biol. 260(2), 2009
PMID: 19501101
Gronau I, Moran S, Yavneh I., J. Theor. Biol. 260(2), 2009
PMID: 19501101
Adaptive Distance Measures for Resolving K2P Quartets: Metric Separation versus Stochastic Noise
AUTHOR UNKNOWN, 2010
AUTHOR UNKNOWN, 2010
Cases in which parsimony or compatability methods will be positively misleading
AUTHOR UNKNOWN, 1978
AUTHOR UNKNOWN, 1978
Taxonomy with confidence
AUTHOR UNKNOWN, 1978
AUTHOR UNKNOWN, 1978
Parsimony, likelihood, and the role of models in molecular phylogenetics.
Steel M, Penny D., Mol. Biol. Evol. 17(6), 2000
PMID: 10833190
Steel M, Penny D., Mol. Biol. Evol. 17(6), 2000
PMID: 10833190
A likelihood justification of parsimony
AUTHOR UNKNOWN, 1985
AUTHOR UNKNOWN, 1985
Parsimony and likelihood: an exchange
AUTHOR UNKNOWN, 1986
AUTHOR UNKNOWN, 1986
How often do wrong models produce better phylogenies?
Yang Z., Mol. Biol. Evol. 14(1), 1997
PMID: 9000758
Yang Z., Mol. Biol. Evol. 14(1), 1997
PMID: 9000758
Topological bias and inconsistency of maximum likelihood using wrong models.
Bruno WJ, Halpern AL., Mol. Biol. Evol. 16(4), 1999
PMID: 10331281
Bruno WJ, Halpern AL., Mol. Biol. Evol. 16(4), 1999
PMID: 10331281
Estimation of evolutionary distances between nucleotide sequences.
Zharkikh A., J. Mol. Evol. 39(3), 1994
PMID: 7932793
Zharkikh A., J. Mol. Evol. 39(3), 1994
PMID: 7932793
Efficient biased estimation of evolutionary distances when substitution rates vary across sites.
Guindon S, Gascuel O., Mol. Biol. Evol. 19(4), 2002
PMID: 11919295
Guindon S, Gascuel O., Mol. Biol. Evol. 19(4), 2002
PMID: 11919295
The use of multiple measurements in taxonomic problems
AUTHOR UNKNOWN, 1936
AUTHOR UNKNOWN, 1936
AUTHOR UNKNOWN, 1973
Lie Markov models.
Sumner JG, Fernandez-Sanchez J, Jarvis PD., J. Theor. Biol. 298(), 2011
PMID: 22212913
Sumner JG, Fernandez-Sanchez J, Jarvis PD., J. Theor. Biol. 298(), 2011
PMID: 22212913
The recovery of trees from measures of dissimilarity
AUTHOR UNKNOWN, 1971
AUTHOR UNKNOWN, 1971
Additive similarity trees
AUTHOR UNKNOWN, 1977
AUTHOR UNKNOWN, 1977
The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction
AUTHOR UNKNOWN, 1999
AUTHOR UNKNOWN, 1999
A few logs suffice to build (almost) all trees (I)
AUTHOR UNKNOWN, 1999
AUTHOR UNKNOWN, 1999
A few logs suffice to build (almost) all trees (II)
AUTHOR UNKNOWN, 1999
AUTHOR UNKNOWN, 1999
AUTHOR UNKNOWN, 1977
A note on the delta method
AUTHOR UNKNOWN, 1992
AUTHOR UNKNOWN, 1992
Constructing a tree on the basis of a set of distances between the hanging vertices
AUTHOR UNKNOWN, 1965
AUTHOR UNKNOWN, 1965
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Saitou N, Nei M., Mol. Biol. Evol. 4(4), 1987
PMID: 3447015
Saitou N, Nei M., Mol. Biol. Evol. 4(4), 1987
PMID: 3447015
A note on the neighbor-joining algorithm of Saitou and Nei.
Studier JA, Keppler KJ., Mol. Biol. Evol. 5(6), 1988
PMID: 3221794
Studier JA, Keppler KJ., Mol. Biol. Evol. 5(6), 1988
PMID: 3221794
Comparison of phylogenetic trees
AUTHOR UNKNOWN, 1981
AUTHOR UNKNOWN, 1981
Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.
Rambaut A, Grassly NC., Comput. Appl. Biosci. 13(3), 1997
PMID: 9183526
Rambaut A, Grassly NC., Comput. Appl. Biosci. 13(3), 1997
PMID: 9183526
PHYLIP - Phylogeny Inference Package (Version 3.2)
AUTHOR UNKNOWN, 1989
AUTHOR UNKNOWN, 1989
Recovering a tree from the leaf colourations it generates under a Markov model
AUTHOR UNKNOWN, 1994
AUTHOR UNKNOWN, 1994
Recovering evolutionary trees under a more realistic model of sequence evolution.
Lockhart PJ, Steel MA, Hendy MD, Penny D., Mol. Biol. Evol. 11(4), 1994
PMID: 19391266
Lockhart PJ, Steel MA, Hendy MD, Penny D., Mol. Biol. Evol. 11(4), 1994
PMID: 19391266
Toward automatic reconstruction of a highly resolved tree of life.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P., Science 311(5765), 2006
PMID: 16513982
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P., Science 311(5765), 2006
PMID: 16513982
Quantitative phylogenetic assessment of microbial communities in diverse environments.
von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P., Science 315(5815), 2007
PMID: 17272687
von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P., Science 315(5815), 2007
PMID: 17272687
AUTHOR UNKNOWN, 1999
Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.
Talavera G, Castresana J., Syst. Biol. 56(4), 2007
PMID: 17654362
Talavera G, Castresana J., Syst. Biol. 56(4), 2007
PMID: 17654362
Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses.
Yarza P, Ludwig W, Euzeby J, Amann R, Schleifer KH, Glockner FO, Rossello-Mora R., Syst. Appl. Microbiol. 33(6), 2010
PMID: 20817437
Yarza P, Ludwig W, Euzeby J, Amann R, Schleifer KH, Glockner FO, Rossello-Mora R., Syst. Appl. Microbiol. 33(6), 2010
PMID: 20817437
BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.
Gascuel O., Mol. Biol. Evol. 14(7), 1997
PMID: 9254330
Gascuel O., Mol. Biol. Evol. 14(7), 1997
PMID: 9254330
The general stochastic model of nucleotide substitution.
Rodriguez F, Oliver JL, Marin A, Medina JR., J. Theor. Biol. 142(4), 1990
PMID: 2338834
Rodriguez F, Oliver JL, Marin A, Medina JR., J. Theor. Biol. 142(4), 1990
PMID: 2338834
A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.
Guindon S, Gascuel O., Syst. Biol. 52(5), 2003
PMID: 14530136
Guindon S, Gascuel O., Syst. Biol. 52(5), 2003
PMID: 14530136
MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S., Mol. Biol. Evol. 28(10), 2011
PMID: 21546353
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S., Mol. Biol. Evol. 28(10), 2011
PMID: 21546353
Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions
AUTHOR UNKNOWN, 2011
AUTHOR UNKNOWN, 2011
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 22938153
PubMed | Europe PMC
Suchen in