FUBAR: a fast, unconstrained bayesian approximation for inferring selection.

  title={FUBAR: a fast, unconstrained bayesian approximation for inferring selection.},
  author={B. Murrell and Sasha Moola and Amandla Mabona and Thomas Weighill and Daniel J. Sheward and Sergei L. Kosakovsky Pond and Konrad Scheffler},
  journal={Molecular biology and evolution},
  volume={30 5},
Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large… 

Contrast-FEL—A Test for Differences in Selective Pressures at Individual Sites among Clades and Sets of Branches

A simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL, suitable for identifying individual alignment sites where any among the K ≥ 2 sets of branches in a phylogenetic tree have detectably different dN/dS ratios, indicative of different selective regimes.

Gene-wide identification of episodic selection.

A new approach to identifying gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate, and a computationally inexpensive evidence metric for identifying sites subject to episodicpositive selection on any foreground branches.

Synonymous Site-to-Site Substitution Rate Variation Dramatically Inflates False Positive Rates of Selection Analyses: Ignore at Your Own Peril

It is found that failing to account for even moderate levels of SRV in selection testing is likely to produce intolerably high false positive rates and add to a growing literature establishing that tests of selection are much more sensitive to certain model assumptions than previously believed.

On the Validity of Evolutionary Models with Site-Specific Parameters

A simulation study is presented providing empirical evidence that a simple version of the models in question does exhibit sensible convergence behavior and that additional taxa, despite not being independent of each other, lead to improved parameter estimates.

A Bayesian Mutation–Selection Framework for Detecting Site-Specific Adaptive Evolution in Protein-Coding Genes

A Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment is presented and it is suggested that the new approach shows greater sensitivity than traditional methods.

One-rate models outperform two-rate models in site-specific dN/dS estimation

It is found that one-rate inference models universally outperform two-rate models for estimating reliable site-specific dN/dS ratios and high levels of divergence among sequences are more critical for obtaining precise point estimates than the number of sequences in the alignment.

Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models

A new approach is described that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab that identifies sites of adaptive substitutions in four genes far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions.

A Comparison of One-Rate and Two-Rate Inference Frameworks for Site-Specific dN/dS Estimation

It is found that one-rate frameworks generally infer more accurate dN/dS point estimates, even when dS varies among sites, and that high levels of divergence among sequences, rather than the number of sequences in the alignment, are more critical for obtaining precise point estimates.

Limited utility of residue masking for positive-selection inference.

It is found that no filter, including original Guidance, consistently benefitted positive-selection inferences, and all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences.

The relationship between dN/dS and scaled selection coefficients.

Establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.



A random effects branch-site model for detecting episodic diversifying selection.

Felsenstein's pruning algorithm is extended to allow efficient likelihood computations for models in which variation over branches (and not just sites) is described in the random effects likelihood framework, and this model treats the selective class of every branch at a particular site as an unobserved state that is chosen independently of that at any other branch.

Conjugate Gibbs Sampling for Bayesian Phylogenetic Models

The conjugate Gibbs formalism allows one to propose efficient implementations of complex models, for instance assuming site-specific substitution processes, that would not be accessible to standard MCMC methods.

Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models

A general method, based on a uniformization technique, which can be utilized to generate realizations of a Markovian substitution process conditional on an alignment of character states and a given tree topology is described.

Taking Variation of Evolutionary Rates Between Sites into Account in Inferring Phylogenies

A model based on population genetics is presented predicting how the rates of evolution might vary from locus to locus, and Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models.

Not so different after all: a comparison of methods for detecting amino acid sites under selection.

Three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection are considered, suggesting that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effect models to allow for variation in synonymous substitution rates, and the naive application ofrandom effects models to extremely sparse data sets.

Bayes empirical bayes inference of amino acid sites under positive selection.

A Bayes empirical Bayes (BEB) approach to the Codon-based substitution models problem is developed, which assigns a prior to the model parameters and integrates over their uncertainties, and the results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach.

Phylogenetics, likelihood, evolution and complexity

SUMMARY Phylogenetics, likelihood, evolution and complexity (PLEX) is a flexible and fast Bayesian Markov chain Monte Carlo software program for large-scale analysis of nucleotide and amino acid data

A Dirichlet process model for detecting positive selection in protein-coding DNA sequences.

This work describes an approach to modeling variation in the nonsynonymous rate of substitution by using a Dirichlet process mixture model, which allows there to be a countably infinite number of nonsynonym rate classes and is very flexible in accommodating different potential distributions.

Detecting Amino Acid Sites Under Positive Selection and Purifying Selection

It is shown that the SLR method can be more powerful than currently published methods for detecting the location of positive selection, especially in difficult cases where the strength of selection is low.

Detecting Individual Sites Subject to Episodic Diversifying Selection

It is found that episodic selection is widespread and it is concluded that the number of sites experiencing positive selection may have been vastly underestimated.