# Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

Dörr D, Gronau I, Moran S, Yavneh I (2012)

Algorithms for Molecular Biology 7(1).

Download

*Journal Article*|

*Published*|

*English*

Author

Department

Abstract

Background: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. Results: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. Conclusions: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.

Keywords

Publishing Year

ISSN

Financial disclosure

Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.

PUB-ID

### Cite this

Dörr D, Gronau I, Moran S, Yavneh I. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions.

*Algorithms for Molecular Biology*. 2012;7(1).Dörr, D., Gronau, I., Moran, S., & Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions.

*Algorithms for Molecular Biology*,*7*(1).Dörr, D., Gronau, I., Moran, S., and Yavneh, I. (2012). Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions.

*Algorithms for Molecular Biology*7.Dörr, D., et al., 2012. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions.

*Algorithms for Molecular Biology*, 7(1). D. Dörr, et al., “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”,

*Algorithms for Molecular Biology*, vol. 7, 2012. Dörr, D., Gronau, I., Moran, S., Yavneh, I.: Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms for Molecular Biology. 7, (2012).

Dörr, Daniel, Gronau, Ilan, Moran, Shlomo, and Yavneh, Irad. “Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions”.

*Algorithms for Molecular Biology*7.1 (2012).**Main File(s)**

File Name

Access Level

Open Access

Last Uploaded

2016-11-18T14:31:26Z

This data publication is cited in the following publications:

This publication cites the following data publications:

### 47 References

Data provided by Europe PubMed Central.

A few logs suffice to build (almost) all trees (I)

AUTHOR UNKNOWN, 1999

AUTHOR UNKNOWN, 1999

A few logs suffice to build (almost) all trees (II)

AUTHOR UNKNOWN, 1999

AUTHOR UNKNOWN, 1999

AUTHOR UNKNOWN, 1977

A note on the delta method

AUTHOR UNKNOWN, 1992

AUTHOR UNKNOWN, 1992

Constructing a tree on the basis of a set of distances between the hanging vertices

AUTHOR UNKNOWN, 1965

AUTHOR UNKNOWN, 1965

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Saitou N, Nei M.,

PMID: 3447015

Saitou N, Nei M.,

*Mol. Biol. Evol.*4(4), 1987PMID: 3447015

A note on the neighbor-joining algorithm of Saitou and Nei.

Studier JA, Keppler KJ.,

PMID: 3221794

Studier JA, Keppler KJ.,

*Mol. Biol. Evol.*5(6), 1988PMID: 3221794

Comparison of phylogenetic trees

AUTHOR UNKNOWN, 1981

AUTHOR UNKNOWN, 1981

Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

Rambaut A, Grassly NC.,

PMID: 9183526

Rambaut A, Grassly NC.,

*Comput. Appl. Biosci.*13(3), 1997PMID: 9183526

PHYLIP - Phylogeny Inference Package (Version 3.2)

AUTHOR UNKNOWN, 1989

AUTHOR UNKNOWN, 1989

Recovering a tree from the leaf colourations it generates under a Markov model

AUTHOR UNKNOWN, 1994

AUTHOR UNKNOWN, 1994

Recovering evolutionary trees under a more realistic model of sequence evolution.

Lockhart PJ, Steel MA, Hendy MD, Penny D.,

PMID: 19391266

Lockhart PJ, Steel MA, Hendy MD, Penny D.,

*Mol. Biol. Evol.*11(4), 1994PMID: 19391266

Toward automatic reconstruction of a highly resolved tree of life.

Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P.,

PMID: 16513982

Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P.,

*Science*311(5765), 2006PMID: 16513982

Quantitative phylogenetic assessment of microbial communities in diverse environments.

von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P.,

PMID: 17272687

von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P.,

*Science*315(5815), 2007PMID: 17272687

AUTHOR UNKNOWN, 1999

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.

Talavera G, Castresana J.,

PMID: 17654362

Talavera G, Castresana J.,

*Syst. Biol.*56(4), 2007PMID: 17654362

Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses.

Yarza P, Ludwig W, Euzeby J, Amann R, Schleifer KH, Glockner FO, Rossello-Mora R.,

PMID: 20817437

Yarza P, Ludwig W, Euzeby J, Amann R, Schleifer KH, Glockner FO, Rossello-Mora R.,

*Syst. Appl. Microbiol.*33(6), 2010PMID: 20817437

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

Gascuel O.,

PMID: 9254330

Gascuel O.,

*Mol. Biol. Evol.*14(7), 1997PMID: 9254330

The general stochastic model of nucleotide substitution.

Rodriguez F, Oliver JL, Marin A, Medina JR.,

PMID: 2338834

Rodriguez F, Oliver JL, Marin A, Medina JR.,

*J. Theor. Biol.*142(4), 1990PMID: 2338834

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Guindon S, Gascuel O.,

PMID: 14530136

Guindon S, Gascuel O.,

*Syst. Biol.*52(5), 2003PMID: 14530136

MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S.,

PMID: 21546353

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S.,

*Mol. Biol. Evol.*28(10), 2011PMID: 21546353

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

AUTHOR UNKNOWN, 2011

AUTHOR UNKNOWN, 2011

### Export

0 Marked Publications### Web of Science

View record in Web of Science®### Sources

PMID: 22938153

PubMed | Europe PMC