Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

Dörr D, Gronau I, Moran S, Yavneh I (2012)
Algorithms for Molecular Biology 7(1).

Download
OA
Journal Article | Published | English
Author
; ; ;
Abstract
Background: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. Results: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. Conclusions: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.
Publishing Year
ISSN
Financial disclosure
Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.
PUB-ID

Cite this

Dörr D, Gronau I, Moran S, Yavneh I. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 2012;7(1).
Dörr, D., Gronau, I., Moran, S., & Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology, 7(1).
Dörr, D., Gronau, I., Moran, S., and Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology 7.
Dörr, D., et al., 2012. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology, 7(1).
D. Dörr, et al., “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”, Algorithms for Molecular Biology, vol. 7, 2012.
Dörr, D., Gronau, I., Moran, S., Yavneh, I.: Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 7, (2012).
Dörr, Daniel, Gronau, Ilan, Moran, Shlomo, and Yavneh, Irad. “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”. Algorithms for Molecular Biology 7.1 (2012).
Main File(s)
Access Level
OA Open Access
Last Uploaded
2016-11-18T14:31:26Z

This data publication is cited in the following publications:
This publication cites the following data publications:

47 References

Data provided by Europe PubMed Central.

A few logs suffice to build (almost) all trees (I)
AUTHOR UNKNOWN, 1999
A few logs suffice to build (almost) all trees (II)
AUTHOR UNKNOWN, 1999

AUTHOR UNKNOWN, 1977
A note on the delta method
AUTHOR UNKNOWN, 1992
Constructing a tree on the basis of a set of distances between the hanging vertices
AUTHOR UNKNOWN, 1965
A note on the neighbor-joining algorithm of Saitou and Nei.
Studier JA, Keppler KJ., Mol. Biol. Evol. 5(6), 1988
PMID: 3221794
Comparison of phylogenetic trees
AUTHOR UNKNOWN, 1981
PHYLIP - Phylogeny Inference Package (Version 3.2)
AUTHOR UNKNOWN, 1989
Recovering a tree from the leaf colourations it generates under a Markov model
AUTHOR UNKNOWN, 1994
Recovering evolutionary trees under a more realistic model of sequence evolution.
Lockhart PJ, Steel MA, Hendy MD, Penny D., Mol. Biol. Evol. 11(4), 1994
PMID: 19391266
Toward automatic reconstruction of a highly resolved tree of life.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P., Science 311(5765), 2006
PMID: 16513982
Quantitative phylogenetic assessment of microbial communities in diverse environments.
von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P., Science 315(5815), 2007
PMID: 17272687

AUTHOR UNKNOWN, 1999
Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses.
Yarza P, Ludwig W, Euzeby J, Amann R, Schleifer KH, Glockner FO, Rossello-Mora R., Syst. Appl. Microbiol. 33(6), 2010
PMID: 20817437
The general stochastic model of nucleotide substitution.
Rodriguez F, Oliver JL, Marin A, Medina JR., J. Theor. Biol. 142(4), 1990
PMID: 2338834
MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S., Mol. Biol. Evol. 28(10), 2011
PMID: 21546353
Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions
AUTHOR UNKNOWN, 2011

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 22938153
PubMed | Europe PMC

Search this title in

Google Scholar