Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

Dörr D, Gronau I, Moran S, Yavneh I (2012)
Algorithms for Molecular Biology 7(1): 22.

Download
OA
Zeitschriftenaufsatz | Veröffentlicht | Englisch
Volltext vorhanden für diesen Nachweis
Autor
; ; ;
Abstract / Bemerkung
Background: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. Results: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. Conclusions: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.
Erscheinungsjahr
Zeitschriftentitel
Algorithms for Molecular Biology
Band
7
Ausgabe
1
Seite(n)
22
ISSN
Finanzierungs-Informationen
Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.
PUB-ID

Zitieren

Dörr D, Gronau I, Moran S, Yavneh I. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 2012;7(1):22.
Dörr, D., Gronau, I., Moran, S., & Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology, 7(1), 22. doi:10.1186/1748-7188-7-22
Dörr, D., Gronau, I., Moran, S., and Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology 7, 22.
Dörr, D., et al., 2012. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology, 7(1), p 22.
D. Dörr, et al., “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”, Algorithms for Molecular Biology, vol. 7, 2012, pp. 22.
Dörr, D., Gronau, I., Moran, S., Yavneh, I.: Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 7, 22 (2012).
Dörr, Daniel, Gronau, Ilan, Moran, Shlomo, and Yavneh, Irad. “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”. Algorithms for Molecular Biology 7.1 (2012): 22.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2016-12-12T10:24:37Z

1 Zitation in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti.
Copetti D, Búrquez A, Bustamante E, Charboneau JLM, Childs KL, Eguiarte LE, Lee S, Liu TL, McMahon MM, Whiteman NK, Wing RA, Wojciechowski MF, Sanderson MJ., Proc Natl Acad Sci U S A 114(45), 2017
PMID: 29078296

47 References

Daten bereitgestellt von Europe PubMed Central.


AUTHOR UNKNOWN, 2004

AUTHOR UNKNOWN, 2003

AUTHOR UNKNOWN, 2002
Evolution of Protein Molecules
AUTHOR UNKNOWN, 1969
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.
Hasegawa M, Kishino H, Yano T., J. Mol. Evol. 22(2), 1985
PMID: 3934395
Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences
AUTHOR UNKNOWN, 1986
A new method for calculating evolutionary substitution rates.
Lanave C, Preparata G, Saccone C, Serio G., J. Mol. Evol. 20(1), 1984
PMID: 6429346
Towards optimal distance functions for stochastic substitution models.
Gronau I, Moran S, Yavneh I., J. Theor. Biol. 260(2), 2009
PMID: 19501101
Adaptive Distance Measures for Resolving K2P Quartets: Metric Separation versus Stochastic Noise
AUTHOR UNKNOWN, 2010
Cases in which parsimony or compatability methods will be positively misleading
AUTHOR UNKNOWN, 1978
Taxonomy with confidence
AUTHOR UNKNOWN, 1978
Parsimony, likelihood, and the role of models in molecular phylogenetics.
Steel M, Penny D., Mol. Biol. Evol. 17(6), 2000
PMID: 10833190
A likelihood justification of parsimony
AUTHOR UNKNOWN, 1985
Parsimony and likelihood: an exchange
AUTHOR UNKNOWN, 1986
How often do wrong models produce better phylogenies?
Yang Z., Mol. Biol. Evol. 14(1), 1997
PMID: 9000758
Topological bias and inconsistency of maximum likelihood using wrong models.
Bruno WJ, Halpern AL., Mol. Biol. Evol. 16(4), 1999
PMID: 10331281
The use of multiple measurements in taxonomic problems
AUTHOR UNKNOWN, 1936

AUTHOR UNKNOWN, 1973
Lie Markov models.
Sumner JG, Fernandez-Sanchez J, Jarvis PD., J. Theor. Biol. 298(), 2011
PMID: 22212913
The recovery of trees from measures of dissimilarity
AUTHOR UNKNOWN, 1971
Additive similarity trees
AUTHOR UNKNOWN, 1977
The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction
AUTHOR UNKNOWN, 1999
A few logs suffice to build (almost) all trees (I)
AUTHOR UNKNOWN, 1999
A few logs suffice to build (almost) all trees (II)
AUTHOR UNKNOWN, 1999

AUTHOR UNKNOWN, 1977
A note on the delta method
AUTHOR UNKNOWN, 1992
Constructing a tree on the basis of a set of distances between the hanging vertices
AUTHOR UNKNOWN, 1965
A note on the neighbor-joining algorithm of Saitou and Nei.
Studier JA, Keppler KJ., Mol. Biol. Evol. 5(6), 1988
PMID: 3221794
Comparison of phylogenetic trees
AUTHOR UNKNOWN, 1981
PHYLIP - Phylogeny Inference Package (Version 3.2)
AUTHOR UNKNOWN, 1989
Recovering a tree from the leaf colourations it generates under a Markov model
AUTHOR UNKNOWN, 1994
Recovering evolutionary trees under a more realistic model of sequence evolution.
Lockhart PJ, Steel MA, Hendy MD, Penny D., Mol. Biol. Evol. 11(4), 1994
PMID: 19391266
Toward automatic reconstruction of a highly resolved tree of life.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P., Science 311(5765), 2006
PMID: 16513982
Quantitative phylogenetic assessment of microbial communities in diverse environments.
von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P., Science 315(5815), 2007
PMID: 17272687

AUTHOR UNKNOWN, 1999
Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses.
Yarza P, Ludwig W, Euzeby J, Amann R, Schleifer KH, Glockner FO, Rossello-Mora R., Syst. Appl. Microbiol. 33(6), 2010
PMID: 20817437
The general stochastic model of nucleotide substitution.
Rodriguez F, Oliver JL, Marin A, Medina JR., J. Theor. Biol. 142(4), 1990
PMID: 2338834
MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S., Mol. Biol. Evol. 28(10), 2011
PMID: 21546353
Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions
AUTHOR UNKNOWN, 2011

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 22938153
PubMed | Europe PMC

Suchen in

Google Scholar