Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses.

  title={Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses.},
  author={Robert Lanfear and Brett Calcott and Simon Y. W. Ho and St{\'e}phane Guindon},
  journal={Molecular biology and evolution},
  volume={29 6},
In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of… 

Tables from this paper

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

A new algorithm is presented that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution, and it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores.

Selecting optimal partitioning schemes for phylogenomic datasets

These two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets are developed: strict and relaxed hierarchical clustering, which provide the best current approaches to inferring partitions on very large datasets.

The impact of partitioning on phylogenomic accuracy

Overall, the results suggests that statistical partitioning, but also the a priori assignment of independent GTR+G models, maximize phylogenomic performance.

A simple method for data partitioning based on relative evolutionary rates

A new method of partitioning phylogenetic datasets without using any prior knowledge is developed and can be applied to DNA sequences (protein-coding, introns, ultra-conserved elements), protein sequences, as well as morphological characters.

The effects of partitioning on phylogenetic inference.

It is found that the choice of partitioning scheme often affects tree topology, particularly when partitioning is omitted, and branch-lengths and bootstrap support are affected, sometimes dramatically so.

mPartition: A Model-Based Method for Partitioning Alignments.

A partitioning method that combines not only the evolutionary rates but also substitution models at sites to partition alignments is proposed, called mPartition, which may lead to increased accuracy of ML-based phylogenetic inference, especially for multiple loci or whole genome datasets.

A protein alignment partitioning method for protein phylogenetic inference

  • Thu Kim LeVinh Sy Le
  • Biology, Computer Science
    2020 RIVF International Conference on Computing and Communication Technologies (RIVF)
  • 2020
A new algorithm to automatically determine a partitioning scheme based on the best-fit model of sites, i.e., sites belong to the same model will be classified into the same group, will significantly improve protein phylogenetic inference from multiple gene or whole genome datasets.

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

While the use of many partitions is an important approach to reducing the uncertainty in posterior time estimates, this work does not recommend its general use for the present, given the limitations of current models of rate drift for partitioned data and the challenges of interpreting the fossil evidence to construct accurate and informative calibrations.

An Introduction to Supertree Construction (and Partitioned Phylogenetic Analyses) with a View Toward the Distinction Between Gene Trees and Species Trees

This chapter argues that a combined, “global congruence” approach in which data sets are analyzed under both a supermatrix (unpartitioned) and supertree (partitionsed) framework represents the best strategy in the authors' attempts to uncover the Tree of Life.



Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards.

A criterion, based on the Bayes factor, for selecting among competing partitioning strategies is proposed and tested and it is demonstrated that how one partitions the data is shown to be a greater concern than simply the overall number of partitions.

Bayesian phylogenetic analysis of combined data.

A Bayesian MCMC approach to the analysis of combined data sets was developed and its utility in inferring relationships among gall wasps based on data from morphology and four genes was explored, supporting the utility of morphological data in multigene analyses.

Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences.

This work investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model, and determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes.

Selecting the best-fit model of nucleotide substitution.

It is shown here that a best-fit model can be readily identified and should be routine in any phylogenetic analysis that uses models of evolution.

Optimal data partitioning and a test case for ray-finned fishes (Actinopterygii) based on ten nuclear loci.

This work proposes a new method, based on cluster analysis, to find an optimal partitioning strategy for multilocus protein-coding data sets, and shows that a model based on only 10 partitions defined by cluster analysis performed better than partitioning by both gene and codon position.

BEAST: Bayesian evolutionary analysis by sampling trees

BEAST is a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree that provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions.

A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data.

A general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data that simplifies to a homogeneous model or a rate-variability model as special cases and always performs at least as well as these two approaches, and often considerably improves upon them.

Assessment of substitution model adequacy using frequentist and Bayesian methods.

It is shown that tests of model adequacy based on the multinomial likelihood often fail to reject simple substitution models, especially when the models incorporate among-site rate variation (ASRV), and normally failing to reject less complex models than those chosen by model selection methods.

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

It is demonstrated that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.

MISFITS: evaluating the goodness of fit between a phylogenetic model and an alignment.

This work presents MISFITS, an approach to evaluate the goodness of fit that introduces a minimum number of "extra substitutions" on the inferred tree to provide a biologically motivated explanation why the alignment may deviate from expectation.