# Confidence Sets for Phylogenetic Trees

@article{Willis2016ConfidenceSF, title={Confidence Sets for Phylogenetic Trees}, author={Amy D. Willis}, journal={Journal of the American Statistical Association}, year={2016}, volume={114}, pages={235 - 244} }

ABSTRACT Inferring evolutionary histories (phylogenetic trees) has important applications in biology, criminology, and public health. However, phylogenetic trees are complex mathematical objects that reside in a non-Euclidean space, which complicates their analysis. While our mathematical, algorithmic, and probabilistic understanding of phylogenies in their metric space is mature, rigorous inferential infrastructure is as yet undeveloped. In this manuscript, we unify recent computational and… Expand

#### 20 Citations

Uncertainty in Phylogenetic Tree Estimates

- Mathematics, Biology
- 2016

The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded, and is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful or due to uncertainty in estimation. Expand

How trustworthy is your tree? Bayesian phylogenetic effective sample size through the lens of Monte Carlo error

- Mathematics, Biology
- 2021

The results indicate that common post-MCMC workflows are insufficient to capture the inherent Monte Carlo error of the tree, and highlight the need for both within-chain mixing and between-chain convergence assessments. Expand

Statistical summaries of unlabelled evolutionary trees and ranked hierarchical clustering trees

- Mathematics, Biology
- 2021

An efficient combinatorial optimization algorithm is provided for computing the Fréchet mean from a sample of or distribution on unlabelled ranked tree shapes and unlabelling ranked genealogies and shows the applicability of the summary statistics for studying popular tree distributions and for comparing the SARS-CoV-2 evolutionary trees across different locations during the COVID-19 epidemic in 2020. Expand

Geometric comparison of phylogenetic trees with different leaf sets

- Computer Science, Biology
- ArXiv
- 2018

This paper describes how to apply a combinatorial algorithm to define and search a space of possible supertrees and, for a collection of tree fragments with different leaf sets, to measure their compatibility. Expand

Robustness of Phylogenetic Inference to Model Misspecification Caused by Pairwise Epistasis

- Medicine
- Molecular biology and evolution
- 2021

A simulation study is presented demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled, and an alignment-based test statistic is introduced that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks. Expand

A Metric Space of Ranked Tree Shapes and Ranked Genealogies

- Biology, Mathematics
- 2018

This work proposes a metric space on ranked genealogies for lineages sampled from both isochronous and time-stamped heterochronously sampling and shows the utility of the metrics via simulations and an application in infectious diseases. Expand

Robustness of phylogenetic inference to model misspecification caused by pairwise epistasis

- Biology, Computer Science
- 2020

A simulation study is presented demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled, and an alignment-based test statistic is introduced that is a diagnostic for pair-wise epistasis and can be used in posterior predictive checks. Expand

The isometry group of phylogenetic tree space is $S_n$

- Mathematics, Biology
- 2019

This largely combinatorial paper shows that the isometry group of this space is the symmetric group on n elements, relevant to distance-based analyses of phylogenetic tree sets. Expand

Convergence of random walks to Brownian motion in phylogenetic tree-space

- Mathematics, Biology
- 2015

It is proved that as the number of steps tends to infinity and the step-size tends to zero, the distribution determined by the transition kernel of the random walk converges to that corresponding to Brownian motion. Expand

Mean and Variance of Phylogenetic Trees.

- Biology, Medicine
- Systematic biology
- 2019

The Fréchet mean and variance are more theoretically justified, and more robust, than previous estimates of this type, and can be estimated reasonably efficiently, providing a foundation for building more advanced statistical methods and leading to applications such as mean hypothesis testing and outlier detection. Expand

#### References

SHOWING 1-10 OF 68 REFERENCES

Point estimates in phylogenetic reconstructions

- Mathematics, Biology
- Bioinform.
- 2014

Motivation: The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space… Expand

Kdetrees: Non-parametric Estimation of Phylogenetic Tree Distributions

- Biology, Medicine
- Bioinform.
- 2014

Kdetrees, a non-parametric method for estimating tree distributions and identifying outlying trees, is proposed and implemented and implemented, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Expand

Principal components analysis in the space of phylogenetic trees

- Mathematics, Biology
- 2011

A novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA is described, illustrated by application to simulated sets of trees and to a set of gene trees from metazoan (animal) species. Expand

Consistency of a phylogenetic tree maximum likelihood estimator

- Mathematics
- 2015

Abstract Phylogenetic trees represent the order and extent of genetic divergence of a fixed collection of organisms. Order of divergence is represented via the tree structure, and extent of… Expand

An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace

- Mathematics, Medicine
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2014

A stochastic algorithm for constructing a principal geodesic or line through treespace which is analogous to the first principal component in standard principal components analysis, though convergence to locally optimal geodesics is possible. Expand

Bayesian Inference of Species Trees from Multilocus Data

- Biology, Medicine
- Molecular biology and evolution
- 2010

It is demonstrated that both BEST and the new Bayesian Markov chain Monte Carlo method for the multispecies coalescent have much better estimation accuracy for species tree topology than concatenation, and the method outperforms BEST in divergence time and population size estimation. Expand

STATISTICAL APPROACH TO TESTS INVOLVING PHYLOGENIES

- 2004

This chapter reviews statistical testing involving phylogenies. We present both the classical framework with the use of sampling distributions involving the bootstrap and permutation tests and the… Expand

Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace

- Computer Science, Biology
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2017

An improvement to the kdetrees algorithm is described, an adaptation of classical kernel density estimation to the metric space of phylogenetic trees (Billera-Holmes-Vogtman treespace), whereby the kernel normalizing constants, are estimated through the use of the novel holonomic gradient methods. Expand

Statistics for phylogenetic trees.

- Mathematics, Medicine
- Theoretical population biology
- 2003

This paper poses the problem of estimating and validating phylogenetic trees in statistical terms, using distances and measures on a natural space of trees, and suggests some coherent ways of tackling them. Expand

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

- Mathematics, Medicine
- Biometrika
- 2017

A geometric object for tree space similar to the kth principal component in Euclidean space is proposed: the locus of the weighted Fréchet mean of Symbol vertex trees when the weights vary over the k‐simplex. Expand