Statistical tests of models of DNA substitution

@article{Goldman2004StatisticalTO,
  title={Statistical tests of models of DNA substitution},
  author={Nick Goldman},
  journal={Journal of Molecular Evolution},
  year={2004},
  volume={36},
  pages={182-198}
}
  • N. Goldman
  • Published 1 February 1993
  • Biology
  • Journal of Molecular Evolution
SummaryPenny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test… 
Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy
TLDR
New thresholds for assessing substitution model adequacy are proposed and shown to lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate.
Simple diagnostic statistical tests of models for DNA substitution
  • N. Goldman
  • Biology
    Journal of Molecular Evolution
  • 2004
TLDR
Three statistics which may give useful diagnostic information on departures from models' predictions are described, the statistical distributions of these statistics are discussed and simple significance tests are derived.
Testing adequacy for DNA substitution models
TLDR
A simple, general, powerful and robust model adequacy testing method based on Pearson’s goodness-of-fit test and binning of site patterns to assess reliability of conclusions after model selection and model fitting have already been applied.
How Well Does Your Phylogenetic Model Fit Your Data?
TLDR
Proposed and existing methods in both the maximum likelihood and Bayesian framework will be discussed here, whilst highlighting their strengths and limitations for assessing goodness of fit.
Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power Using Marginal Tests
TLDR
Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this, and readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods report.
Selecting models of evolution THEORY
TLDR
Although this chapter focuses on models of nucleotide substitution, all the points made herein can be applied directly to models of amino-acid replacement, and maximum likelihood provides a framework in this chapter.
Substitution Model Adequacy and Assessing the Reliability of Estimates of Virus Evolutionary Rates and Time Scales.
TLDR
The results partly explain the lack of consensus over estimates of the long-term evolutionary time scale of these viruses, and indicate that assessing the adequacy of substitution models should be routinely used to determine whether estimates are reliable.
The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support.
TLDR
The relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty and raises several issues regarding the process of model selection.
Selecting models of evolution
TLDR
This chapter discusses phylogenetic methods, which are based on a number of assumptions about the evolutionary process and describe the different probabilities of change from one nucleotide or amino acid to another along a phylogenetic tree.
Computational statistics in molecular phylogenetics
TLDR
A new, portable and flexible application, named INDELible, is implemented, which can be used to generate nucleotide, amino acid and codon sequence data by simulating indels (under several models of indel length distribution) as well as substitutions (under a rich repertoire of substitution models).
...
...

References

SHOWING 1-10 OF 80 REFERENCES
Evolutionary trees from nucleic acid and protein sequences
  • M. Bishop, A. Friday
  • Biology
    Proceedings of the Royal Society of London. Series B. Biological Sciences
  • 1985
TLDR
This account examines methods for the estimation of phylogenetic trees on the basis of probabilistic models, and discusses weaknesses of the current stochastic models and point out ways in which accumulating experimental information may lead to their refinement or refutation.
Estimating the reliability of evolutionary trees.
TLDR
Six protein sequences from the same 11 mammalian taxa were used to estimate the accuracy and reliability of phylogenetic trees using real, rather than simulated, data and it was concluded that it is possible to give a reasonable estimate of the reliability of the final tree, at least when several sequences are combined.
Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants.
TLDR
The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method, which may be used to determine the feasibility of any tree for which the maximum likelihood can be computed.
Lineage effects and the index of dispersion of molecular evolution.
TLDR
A method for correcting for lineage effects in the estimation of R(t) is presented for trees made up of three species and computer simulations are presented to give confidence in the estimate for replacement substitutions but also to demonstrate that the estimates for silent substitutions is sensitive to corrections for multiple substitutions and is not as reliable.
Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea
TLDR
A new method for estimating the variance of the difference between log likelihood of different tree topologies is developed by expressing it explicitly in order to evaluate the maximum likelihood branching order among Hominoidea.
MAXIMUM LIKELIHOOD INFERENCE OF PHYLOGENETIC TREES, WITH SPECIAL REFERENCE TO A POISSON PROCESS MODEL OF DNA SUBSTITUTION AND TO PARSIMONY ANALYSES
TLDR
From the elucidation of implicit models underlying traditional "par- simony" and "compatibility" analyses, it is seen that Poisson process analysis gives a statistically consistent estimate of phylogeny, and that parsimony methods do indeed have a maximum likelihood foundation but give potentially incorrect estimates of phylogenies.
Converting distance to time: application to human evolution.
Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters
TLDR
The application of maximum likelihood methods to discrete characters is examined, and it is shown that parsimony methods are notmaximum likelihood methods under the assumptions made by Farris, and an algorithm which enables rapid calculation of the likelihood of a phylogeny is described.
A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony.
  • J. Lake
  • Biology
    Molecular biology and evolution
  • 1987
TLDR
The method of evolutionary parsimony accurately predicts the tree, even when substitution rates differ greatly in neighboring peripheral branches (conditions under which parsimony will consistently fail), as the number of substitutions in peripheral branches becomes fewer, the parsimony and the evolutionary-parsimony solutions converge.
...
...