Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data

@article{Simmons2012MisleadingRO,
  title={Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data},
  author={M. Simmons},
  journal={Cladistics},
  year={2012},
  volume={28}
}
The amount of missing data in many contemporary phylogenetic analyses has substantially increased relative to previous norms, particularly in supermatrix studies that compile characters from multiple previous analyses. In such cases the missing data are non‐randomly distributed and usually present in all partitions (i.e. groups of characters) sampled. Parametric methods often provide greater resolution and support than parsimony in such cases, yet this may be caused by extrapolation of branch… Expand
Limitations of locally sampled characters in phylogenetic analyses of sparse supermatrices.
  • M. Simmons
  • Biology, Medicine
  • Molecular phylogenetics and evolution
  • 2014
Empirical and simulated examples were used to demonstrate the following four points in the context of sparse supermatrices. First, locally sampled characters, when analyzed with low quality heuristicExpand
Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data.
  • M. Simmons
  • Biology, Medicine
  • Molecular phylogenetics and evolution
  • 2012
TLDR
Contrived examples were used to demonstrate that non-random distributions of missing data, even without rate heterogeneity among characters and a well fitting model, can provide misleading likelihood-based topologies and branch-support values that are radically unstable based on slight modifications to character sampling. Expand
Disparate parametric branch-support values from ambiguous characters.
TLDR
The information content in "redundant" terminals is described as well as a novel approach to help identify clades that cannot be unequivocally supported by synapomorphies in empirical matrices to examine how Bayesian MCMC, maximum likelihood, and parsimony methods interpret ambiguous optimization of character states. Expand
A confounding effect of missing data on character conflict in maximum likelihood and Bayesian MCMC phylogenetic analyses.
  • M. Simmons
  • Biology, Medicine
  • Molecular phylogenetics and evolution
  • 2014
TLDR
The range of conditions in which maximum likelihood and Bayesian MCMC methods are biased in favor of phylogenetic signal present in globally sampled characters over that present in conflicting locally sampled characters was quantified. Expand
The Impact of Missing Data on Species Tree Estimation.
TLDR
It is demonstrated that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed and that a sufficiently large number of genes are sampled. Expand
Do missing data influence the accuracy of divergence-time estimation with BEAST?
TLDR
Overall, missing data (and even numbers of genes sampled) may have only minor impacts on the accuracy of divergence dating with BEAST, relative to the dramatic effects of fossil calibrations. Expand
Differences between hard and soft phylogenetic data
  • R. Sansom, M. Wills
  • Biology, Medicine
  • Proceedings of the Royal Society B: Biological Sciences
  • 2017
When building the tree of life, variability of phylogenetic signal is often accounted for by partitioning gene sequences and testing for differences. The same considerations, however, are rarelyExpand
Phylogenetic inference using discrete characters: performance of ordered and unordered parsimony and of three-item statements
TLDR
The results suggest that the hierarchical character representation not only results in the greatest resolving power, but also in the highest artefactual resolution, both with the simulated and empirical data. Expand
Divergence and support among slightly suboptimal likelihood gene trees
Contemporary phylogenomic studies frequently incorporate two‐step coalescent analyses wherein the first step is to infer individual‐gene trees, generally using maximum‐likelihood implemented in theExpand
The effects of subsampling gene trees on coalescent methods applied to ancient divergences.
TLDR
This method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 80 REFERENCES
Missing data in phylogenetic analysis: reconciling results from simulations and empirical data.
TLDR
Previous simulation and empirical studies showing that taxa with extensive missing data can be accurately placed in phylogenetic analyses and that adding characters with missing dataCan be beneficial can be beneficial (at least under some conditions) are confirmed. Expand
Missing data and the design of phylogenetic analyses
  • J. Wiens
  • Biology, Medicine
  • J. Biomed. Informatics
  • 2006
TLDR
The effects of missing data on phylogenetic analyses are reviewed to allow researchers to design studies that can reconstruct large phylogenies quickly, economically, and accurately. Expand
Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ?
—Missing data are a widely recognized nuisance factor in phylogenetic analyses, and the fear of missing data may deter systematists from including characters that are highly incomplete. In thisExpand
PROBLEMS DUE TO MISSING DATA IN PHYLOGENETIC ANALYSES INCLUDING FOSSILS: A CRITICAL REVIEW
TLDR
Missing data simply represent the unknown and should not be viewed as an impediment to considering all available evidence in phylogenetic analyses, nor used as justification for excluding specific taxa or characters. Expand
Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous
TLDR
It is shown that maximum likelihood and BMCMC can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change non-identically over time. Expand
Missing data, incomplete taxa, and phylogenetic accuracy.
  • J. Wiens
  • Medicine, Biology
  • Systematic biology
  • 2003
TLDR
In this study, simulations are used to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells, and suggest a more effective strategy for dealing with incompleteTaxa. Expand
Efficiently resolving the basal clades of a phylogenetic tree using Bayesian and parsimony approaches: a case study using mitogenomic data from 100 higher teleost fishes.
TLDR
For this empirical study, the most efficient of the six approaches considered to resolve the basal clades when adding nucleotides to a dataset that consists of a single gene sampled for a small, but representative, number of taxa, is to increase character sampling and analyze the characters using the Bayesian method. Expand
The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference
TLDR
The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis. Expand
Effects of data incompleteness on the relative performance of parsimony and Bayesian approaches in a supermatrix phylogenetic reconstruction of Mustelidae and Procyonidae (Carnivora)
TLDR
Parsimony and Bayesian analyses on a mustelid–procyonid molecular supermatrix found no compelling evidence in support of a relationship between the inferior performance of parsimony and taxon incompleteness, and the relatively good performance of the analyses may be related to the large number of sampled characters. Expand
Quantification of the success of phylogenetic inference in simulations
For phylogenetic simulation studies, the accuracy of topological reconstruction obtained from different data matrices or different methods of phylogenetic inference generally needs to be quantified.Expand
...
1
2
3
4
5
...