Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies.

@article{Halpern1998EvolutionaryDF,
  title={Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies.},
  author={Anne L. Halpern and William J. Bruno},
  journal={Molecular biology and evolution},
  year={1998},
  volume={15 7},
  pages={
          910-7
        }
}
Estimation of evolutionary distances from coding sequences must take into account protein-level selection to avoid relative underestimation of longer evolutionary distances. Current modeling of selection via site-to-site rate heterogeneity generally neglects another aspect of selection, namely position-specific amino acid frequencies. These frequencies determine the maximum dissimilarity expected for highly diverged but functionally and structurally conserved sequences, and hence are crucial… 

Figures from this paper

Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates
TLDR
Codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences and the relationship between Rate4Site and dN∕dS is elucidated.
Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles
TLDR
A probabilistic model is proposed that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene and is applied to a dozen real protein-coding gene alignments and finds it to produce biologically plausible inferences.
Detecting Adaptation in Protein-Coding Genes Using a Bayesian Site-Heterogeneous Mutation-Selection Codon Substitution Model
TLDR
The use of a mutation–selection framework that includes a Dirichlet process approach to account for across-codon-site variation in amino acid fitness profiles as a null model for the detection of adaptation is studied.
Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence
TLDR
It is found that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin, underscores the importance of modeling site- specific amino- acid preferences when estimating deep divergence times—but shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral divergence
TLDR
This work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times—but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Population Genetics Based Phylogenetics Under Stabilizing Selection for an Optimal Amino Acid Sequence: A Nested Modeling Approach
TLDR
A new phylogenetic approach SelAC (Selection on Amino acids and Codons), whose substitution rates are based on a nested model linking protein expression to population genetics, indicates there is great potential for more accurate inference of phylogenetic trees and branch lengths from already existing data through the use of nested, mechanistic models.
Theory of measurement for site-specific evolutionary rates in amino-acid sequences
TLDR
This work develops a theory of measurement for site-specific evolutionary rates, by analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model and uses misspecification as a deliberate strategy to result in robust and meaningful parameter inference.
Site-Specific Amino Acid Preferences Are Mostly Conserved in Two Closely Related Protein Homologs
TLDR
It is found that site-specific evolutionary models informed by the experiments greatly outperformed nonsite-specific alternatives in fitting phylogenies of nucleoproteins from human, swine, equine, and avian influenza.
Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes
TLDR
A codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene, is developed, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.
An Improved Codon Modeling Approach for Accurate Estimation of the Mutation Bias
TLDR
An improved codon modeling approach where the fixation rate is not seen as a scalar anymore, but as a tensor unfolding along multiple directions, which gives an accurate representation of how mutation and selection oppose each other at equilibrium.
...
...

References

SHOWING 1-10 OF 21 REFERENCES
A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome.
TLDR
Simulations help confirm previous suggestions that silent sites are saturated, leaving no evidence of heterogeneity in synonymous substitution rates, and confirm previous findings that substitution rates in the chloroplast genome are subject to both lineage-specific and locus-specific effects.
A codon-based model of nucleotide substitution for protein-coding DNA sequences.
TLDR
Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.
Codon substitution in evolution and the "saturation" of synonymous changes.
TLDR
A mathematical model for codon substitution is presented, taking into account unequal mutation rates among different nucleotides and purifying selection, and it is shown that, when the mutation rates are not equal, the estimate of synonymous substitutions obtained by Perler et al. increases nonlinearly, although the true number of synonymous substitution increases linearly.
Estimation of Reversible Substitution Matrices from Multiple Pairs of Sequences
TLDR
A weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches and resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.
Using substitution probabilities to improve position-specific scoring matrices
TLDR
This work introduces a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities and was a substantial improvement over the traditional average score method used for constructing profiles.
A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.
  • P. Lewis
  • Biology
    Molecular biology and evolution
  • 1998
TLDR
The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.
Combining protein evolution and secondary structure.
An evolutionary model that combines protein secondary structure and amino acid replacement is introduced. It allows likelihood analysis of aligned protein sequences and does not require the
Amino acid substitution matrices from protein blocks.
  • S. Henikoff, J. Henikoff
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1992
TLDR
This work has derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins, leading to marked improvements in alignments and in searches using queries from each of the groups.
A Hidden Markov Model approach to variation among sites in rate of evolution.
TLDR
The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences and it is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogenies what assignment of rates to sites has the largest posterior probability.
Hidden Markov models in computational biology. Applications to protein modeling.
TLDR
The results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling.
...
...