The impact of missing data on real morphological phylogenies: influence of the number and distribution of missing entries

@article{Prevosti2009TheIO,
  title={The impact of missing data on real morphological phylogenies: influence of the number and distribution of missing entries},
  author={F. Prevosti and M. A. Chemisquy},
  journal={Cladistics},
  year={2009},
  volume={26}
}
Here we explore the effect of missing data in phylogenetic analyses using a large number of real morphological matrices. Different percentages and patterns of missing entries were added to each matrix, and their influence was evaluated by comparing the accuracy and error of most parsimonious trees. The relationships between accuracy and error and different parameters (e.g. the number of taxa and characters, homoplasy, support) were also evaluated. Our findings, based on real matrices, agree… Expand
Exploring the impact of unstable terminals on branch support values in paleontological data
TLDR
The results suggest that increasing character sampling and using extended implied weighting decreases the impact of wildcard terminals, and provides insights for designing future research dealing with unstable terminals, a typical problem of paleontological data. Expand
Bias and Sensitivity in the Placement of Fossil Taxa Resulting from Interpretations of Missing Data
  • R. Sansom
  • Biology, Medicine
  • Systematic biology
  • 2015
TLDR
Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Expand
JOINED AT THE HIP: LINKED CHARACTERS AND THE PROBLEM OF MISSING DATA IN STUDIES OF DISPARITY
TLDR
An algorithm is developed that assesses the distribution of missing characters in extinct taxa, and simulates data loss by applying that distribution to extant taxa and present and test a new disparity method that uses the linkage algorithm to correct for the bias caused by missing data. Expand
Reassessing the role of morphology in bryophyte phylogenetics: combined data improves phylogenetic inference despite character conflict.
TLDR
The results indicate that adding morphology may contribute to the inference of phylogenetic relationships of bryophytes despite character conflict, and suggests that analyses of combined data may provide conservative assessments of data conflict and, eventually, lead to an improved sampling of morphological characters in large-scale analyses of b Bryophytes. Expand
Death is on Our Side: Paleontological Data Drastically Modify Phylogenetic Hypotheses.
TLDR
Predictive models are developed that demonstrate that the possession of distinctive character state combinations is the primary predictor of the degree of induced topological change, and that the relative impact of taxa can be predicted to some extent before any phylogenetic analysis. Expand
Evaluating the clade size effect in alternative measures of branch support
TLDR
The role of homoplasy is corroborated as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium-sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resamplings measures of support. Expand
The challenges and potential utility of phenotypic specimen-level phylogeny based on maximum parsimony
  • E. Tschopp, P. Upchurch
  • Geology, Computer Science
  • Earth and Environmental Science Transactions of the Royal Society of Edinburgh
  • 2018
TLDR
Although time-consuming and methodologically challenging, specimen-level phylogenetic analysis is a highly useful tool to assess intraspecific variability and provide the basis for a more informed and accurate creation of species-level operational taxonomic units in large-scale systematic studies. Expand
Phylogeny, paleontology, and primates: do incomplete fossils bias the tree of life?
TLDR
The results support the interpretation that Darwinius is strepsirhine, not haplorhines, and suggest that paleontological datasets are reliable in primate phylogeny reconstruction, and find a positive correlation between fossil completeness and topological congruence. Expand
Death is on Our Side: Paleontological Data Drastically Modify Phylogenetic Hypotheses
TLDR
Predictive models are developed that demonstrate that the possession of distinctive character state combinations is the primary predictor of the degree of induced topological change, and that the relative impact of taxa (fossil and extant) can be predicted to some extent before any analysis. Expand
A new phylogeny of ichthyosaurs (Reptilia: Diapsida)
TLDR
The largest phylogenetic analysis of ichthyosaurs to date is presented, with 114 ingroup taxa coded at species level and the Bayesian inference tree with gamma-distribution rate prior selected as the best tree based on recent analyses showing improved accuracy. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 68 REFERENCES
Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ?
—Missing data are a widely recognized nuisance factor in phylogenetic analyses, and the fear of missing data may deter systematists from including characters that are highly incomplete. In thisExpand
Missing data and the accuracy of Bayesian phylogenetics
TLDR
Simulation results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses, as long as the overall number of characters in the analysis is large. Expand
MISSING ENTRY REPLACEMENT DATA ANALYSIS: A REPLACEMENT APPROACH TO DEALING WITH MISSING DATA IN PALEONTOLOGICAL AND TOTAL EVIDENCE DATA SETS
TLDR
The MERDA value is the frequency with which a particular clade is recovered in replicated analyses where missing observations are replaced randomly with observable states, and a technique to de-resolve missing data-dependent clades is proposed. Expand
Missing data and the design of phylogenetic analyses
  • J. Wiens
  • Biology, Medicine
  • J. Biomed. Informatics
  • 2006
TLDR
The effects of missing data on phylogenetic analyses are reviewed to allow researchers to design studies that can reconstruct large phylogenies quickly, economically, and accurately. Expand
Missing data, incomplete taxa, and phylogenetic accuracy.
  • J. Wiens
  • Medicine, Biology
  • Systematic biology
  • 2003
TLDR
In this study, simulations are used to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells, and suggest a more effective strategy for dealing with incompleteTaxa. Expand
INCOMPLETE TAXA, INCOMPLETE CHARACTERS, AND PHYLOGENETIC ACCURACY: IS THERE A MISSING DATA PROBLEM?
TLDR
Results suggest that analyses which combine data from fossils and molecular data sets can be successful, despite large amounts of missing data, which is defined here as the success of a method at reconstructing the true phylogeny. Expand
Is it better to add taxa or characters to a difficult phylogenetic problem?
TLDR
The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation using a four-taxon tree representing a difficult phylogenetic problem with an extreme situation of long branch attraction. Expand
PROBLEMS DUE TO MISSING DATA IN PHYLOGENETIC ANALYSES INCLUDING FOSSILS: A CRITICAL REVIEW
TLDR
Missing data simply represent the unknown and should not be viewed as an impediment to considering all available evidence in phylogenetic analyses, nor used as justification for excluding specific taxa or characters. Expand
Missing Data versus Missing Characters in Phylogenetic Analysis
As noted by Nixon and Davis (1991) and Platnick et al. (1991), this particular use of the missing data coding is but one of several. A missing data entry in a phyloge? ne data matrix might mean thatExpand
Coping with Abundant Missing Entries in Phylogenetic Inference Using Parsimony
?When cladistic data sets include taxa with abundant missing entries, parsimony anal? ysis may yield multiple equally optimal trees and necessitate the use of consensus methods,to summarizeExpand
...
1
2
3
4
5
...