Confidence Sets for Phylogenetic Trees

  title={Confidence Sets for Phylogenetic Trees},
  author={Amy D. Willis},
  journal={Journal of the American Statistical Association},
  pages={235 - 244}
  • A. Willis
  • Published 2016
  • Mathematics, Biology
  • Journal of the American Statistical Association
ABSTRACT Inferring evolutionary histories (phylogenetic trees) has important applications in biology, criminology, and public health. However, phylogenetic trees are complex mathematical objects that reside in a non-Euclidean space, which complicates their analysis. While our mathematical, algorithmic, and probabilistic understanding of phylogenies in their metric space is mature, rigorous inferential infrastructure is as yet undeveloped. In this manuscript, we unify recent computational and… Expand
Uncertainty in Phylogenetic Tree Estimates
The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded, and is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful or due to uncertainty in estimation. Expand
How trustworthy is your tree? Bayesian phylogenetic effective sample size through the lens of Monte Carlo error
The results indicate that common post-MCMC workflows are insufficient to capture the inherent Monte Carlo error of the tree, and highlight the need for both within-chain mixing and between-chain convergence assessments. Expand
Statistical summaries of unlabelled evolutionary trees and ranked hierarchical clustering trees
An efficient combinatorial optimization algorithm is provided for computing the Fréchet mean from a sample of or distribution on unlabelled ranked tree shapes and unlabelling ranked genealogies and shows the applicability of the summary statistics for studying popular tree distributions and for comparing the SARS-CoV-2 evolutionary trees across different locations during the COVID-19 epidemic in 2020. Expand
Geometric comparison of phylogenetic trees with different leaf sets
This paper describes how to apply a combinatorial algorithm to define and search a space of possible supertrees and, for a collection of tree fragments with different leaf sets, to measure their compatibility. Expand
Robustness of Phylogenetic Inference to Model Misspecification Caused by Pairwise Epistasis
A simulation study is presented demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled, and an alignment-based test statistic is introduced that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks. Expand
A Metric Space of Ranked Tree Shapes and Ranked Genealogies
This work proposes a metric space on ranked genealogies for lineages sampled from both isochronous and time-stamped heterochronously sampling and shows the utility of the metrics via simulations and an application in infectious diseases. Expand
Robustness of phylogenetic inference to model misspecification caused by pairwise epistasis
A simulation study is presented demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled, and an alignment-based test statistic is introduced that is a diagnostic for pair-wise epistasis and can be used in posterior predictive checks. Expand
The isometry group of phylogenetic tree space is $S_n$
This largely combinatorial paper shows that the isometry group of this space is the symmetric group on n elements, relevant to distance-based analyses of phylogenetic tree sets. Expand
Convergence of random walks to Brownian motion in phylogenetic tree-space
It is proved that as the number of steps tends to infinity and the step-size tends to zero, the distribution determined by the transition kernel of the random walk converges to that corresponding to Brownian motion. Expand
Mean and Variance of Phylogenetic Trees.
The Fréchet mean and variance are more theoretically justified, and more robust, than previous estimates of this type, and can be estimated reasonably efficiently, providing a foundation for building more advanced statistical methods and leading to applications such as mean hypothesis testing and outlier detection. Expand


Point estimates in phylogenetic reconstructions
Motivation: The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the spaceExpand
Kdetrees: Non-parametric Estimation of Phylogenetic Tree Distributions
Kdetrees, a non-parametric method for estimating tree distributions and identifying outlying trees, is proposed and implemented and implemented, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Expand
Principal components analysis in the space of phylogenetic trees
A novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA is described, illustrated by application to simulated sets of trees and to a set of gene trees from metazoan (animal) species. Expand
Consistency of a phylogenetic tree maximum likelihood estimator
Abstract Phylogenetic trees represent the order and extent of genetic divergence of a fixed collection of organisms. Order of divergence is represented via the tree structure, and extent ofExpand
An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace
  • T. M. Nye
  • Mathematics, Medicine
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • 2014
A stochastic algorithm for constructing a principal geodesic or line through treespace which is analogous to the first principal component in standard principal components analysis, though convergence to locally optimal geodesics is possible. Expand
Bayesian Inference of Species Trees from Multilocus Data
It is demonstrated that both BEST and the new Bayesian Markov chain Monte Carlo method for the multispecies coalescent have much better estimation accuracy for species tree topology than concatenation, and the method outperforms BEST in divergence time and population size estimation. Expand
This chapter reviews statistical testing involving phylogenies. We present both the classical framework with the use of sampling distributions involving the bootstrap and permutation tests and theExpand
Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace
An improvement to the kdetrees algorithm is described, an adaptation of classical kernel density estimation to the metric space of phylogenetic trees (Billera-Holmes-Vogtman treespace), whereby the kernel normalizing constants, are estimated through the use of the novel holonomic gradient methods. Expand
Statistics for phylogenetic trees.
  • S. Holmes
  • Mathematics, Medicine
  • Theoretical population biology
  • 2003
This paper poses the problem of estimating and validating phylogenetic trees in statistical terms, using distances and measures on a natural space of trees, and suggests some coherent ways of tackling them. Expand
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
A geometric object for tree space similar to the kth principal component in Euclidean space is proposed: the locus of the weighted Fréchet mean of Symbol vertex trees when the weights vary over the k‐simplex. Expand