Estimating the pattern of nucleotide substitution

@article{Yang2004EstimatingTP,
  title={Estimating the pattern of nucleotide substitution},
  author={Ziheng Yang},
  journal={Journal of Molecular Evolution},
  year={2004},
  volume={39},
  pages={105-111}
}
  • Ziheng Yang
  • Published 1 July 1994
  • Biology
  • Journal of Molecular Evolution
Knowledge of the pattern of nucleotide substitution is important both to our understanding of molecular sequence evolution and to reliable estimation of phylogenetic relationships. The method of parsimony analysis, which has been used to estimate substitution patterns in real sequences, has serious drawbacks and leads to results difficult to interpret. In this paper a model-based maximum likelihood approach is proposed for estimating substitution patterns in real sequences. Nucleotide… 
Evolutionary distance estimation under heterogeneous substitution pattern among lineages.
TLDR
This work presents a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences and shows that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site.
Estimation of Phylogeny Using a General Markov Model
TLDR
The non-homogeneous model of nucleotide substitution proposed by Barry and Hartigan is the most general model of DNA evolution assuming an independent and identical process at each site, and the most likely tree under the three models is determined.
Phylogenetic analysis using parsimony and likelihood methods
TLDR
Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter, and its performance relative to that of the likelihood method was especially noted.
The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support.
TLDR
The relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty and raises several issues regarding the process of model selection.
Maximum-likelihood models for combined analyses of multiple sequence data
Models of nucleotide substitution were constructed for combined analyses of heterogeneous sequence data (such as those of multiple genes) from the same set of species. The models account for
An empirical examination of the utility of codon-substitution models in phylogeny reconstruction.
TLDR
Although computational burden makes codon models unfeasible for tree search in large data sets, it is suggested that they may be useful for comparing candidate trees and caution against use of overly complex substitution models.
Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution
TLDR
Methods to correct for systematic biases and use computer simulation to evaluate their performance when the substitution process is nonstationary are implemented and it is suggested that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.
Should We Use Model-Based Methods for Phylogenetic Inference When We Know That Assumptions About Among-Site Rate Variation and Nucleotide Substitution Pattern Are Violated ?
TLDR
These results suggest that application of increasingly general and complex models would sometimes lead to decreased efŽciency, despite the fact that the more complex models almost always provide signiŽcantly better access to real data than the simpler models.
Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences.
TLDR
By analyzing divergent and conserved interspecific mammalian sequences and intraspecific human population data, the superiority of the codon substitution models is shown and the advantages and disadvantages of the models of the 3 types are discussed.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
Estimation of evolutionary distances between homologous nucleotide sequences.
  • M. Kimura
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1981
TLDR
It is pointed out that the rates of synonymous base substitutions not only are very high but also are roughly equal to each other between genes even when amino acid-altering substitution rates are quite different and that this is consistent with the neutral mutation-random drift hypothesis of molecular evolution.
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences
  • M. Kimura
  • Biology
    Journal of Molecular Evolution
  • 2005
TLDR
Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.
Statistical tests of models of DNA substitution
  • N. Goldman
  • Biology
    Journal of Molecular Evolution
  • 2004
TLDR
A test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein.
Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites.
  • Z. Yang
  • Biology
    Molecular biology and evolution
  • 1993
TLDR
Felsenstein's maximum-likelihood approach for inferring phylogeny from DNA sequences is extended to the case where substitution rates over sites are described by the gamma distribution and a numerical example is presented to show that the method fits the data better than do previous models.
Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide
SummaryA formal mathematical analysis of Kimura's (1981) six-parameter model of nucleotide substitution for the case of unequal substitution rates among different pairs of nucleotides is conducted,
Patterns of nucleotide substitution in pseudogenes and functional genes
TLDR
The pattern obtained suggests that transition mutations occur somewhat more frequently than transversion mutations and that mutations result more often in A or T than in G or C.
Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants.
TLDR
The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method, which may be used to determine the feasibility of any tree for which the maximum likelihood can be computed.
Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.
TLDR
A new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions, suggested that the transition/transversion ratio for the entire control region was approximately 15 and nearly the same for the two species.
Maximum likelihood inference of protein phylogeny and the origin of chloroplasts
TLDR
A maximum likelihood method based on a Markov model that takes into account the unequal transition probabilities among pairs of amino acids and does not assume constancy of rate among different lineages is expected to be powerful in inferring phylogeny among distantly related proteins.
A new method for calculating evolutionary substitution rates
TLDR
It is found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach.
...
...