Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

@article{Nye2017PrincipalCA,
  title={Principal component analysis and the locus of the Fr{\'e}chet mean in the space of phylogenetic trees},
  author={Tom M. W. Nye and Xiaoxian Tang and Grady Weyenberg and Ruriko Yoshida},
  journal={Biometrika},
  year={2017},
  volume={104},
  pages={901 - 922}
}
Summary Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi‐dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high‐dimensional data to a low‐dimensional representation that preserves much of the sample's structure… 

Figures and Tables from this paper

Tropical principal component analysis on the space of phylogenetic trees

This work develops a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo (MCMC) approach that performs well with simulation studies, and it is applied to three empirical datasets.

Foundations of the Wald Space for Phylogenetic Trees

. Evolutionary relationships between species are represented by phylogenetic trees, but these relationships are subject to uncertainty due to the random nature of evolution. A geometry for the space

Tropical Principal Component Analysis and Its Application to Phylogenetics

This work defines and analyzes two analogues of principal component analysis in the setting of tropical geometry and gives approximative algorithms for both approaches and applies them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.

Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective

A novel framework to study sets of phylogenetic trees based on tropical geometry is proposed and studied, which exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art.

Tropical principal component analysis on the space of ultrametrics

In 2019, Yoshida et al. introduced a notion of tropical principal component analysis (PCA). The output is a tropical polytope with a fixed number of vertices that best fits the data. We here apply

Confidence Sets for Phylogenetic Trees

  • A. Willis
  • Biology
    Journal of the American Statistical Association
  • 2018
This manuscript unify recent computational and probabilistic advances to construct tree–valued confidence sets, identifying the best supported most recent ancestor of the Zika virus, and formally testing the hypothesis that a Floridian dentist with AIDS infected two of his patients with HIV.

Confidence procedures for phylogenetic trees

The inferential method is a confidence set for the Fréchet mean of a distribution with support on the metric space of phylogenetic trees, and two exploratory methods are proposed for visualizing collections of trees, which rely on similar tools to the confidence set procedure.

The space of equidistant phylogenetic cactuses

It is shown that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points, and an encoding of ranked, rooted X-trees in terms of partitions of X provides an alternative proof that the space of ultrametric trees on X is CAT( 0).

Statistics for Data with Geometric Structure

Statistics for data with geometric structure is an active and diverse topic of research. Applications include manifold spaces in directional data or symmetric positive definite matrices and some

Statistics on stratified spaces

References

SHOWING 1-10 OF 48 REFERENCES

Principal components analysis in the space of phylogenetic trees

A novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA is described, illustrated by application to simulated sets of trees and to a set of gene trees from metazoan (animal) species.

Geometry of the Space of Phylogenetic Trees

We consider a continuous space which models the set of all phylogenetic trees having a fixed set of leaves. This space has a natural metric of nonpositive curvature, giving a way of measuring

Central limit theorems for Fréchet means in the space of phylogenetic trees

This paper studies the characterisation, and the limiting distributions, of Frechet means in the space of phylogenetic trees. This space is topologically stratified, as well as being a CAT(0) space.

Tree-Space Statistics and Approximations for Large-Scale Analysis of Anatomical Trees

This paper takes advantage of a very large dataset (N=8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well and illustrates how the structure and geometry of airway trees vary across a population.

A Fast Algorithm for Computing Geodesic Distances in Tree Space

An important open problem is to find a polynomial time algorithm for finding geodesics in tree space, which starts with a simple initial path and moves through a series of successively shorter paths until the geodesic is attained.

Barycentric subspace analysis on manifolds

  • X. Pennec
  • Mathematics
    The Annals of Statistics
  • 2018
This paper investigates the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. We first propose a new and more general type of family of subspaces in manifolds that we call

Kdetrees: Non-parametric Estimation of Phylogenetic Tree Distributions

Kdetrees, a non-parametric method for estimating tree distributions and identifying outlying trees, is proposed and implemented and implemented, with the goal of identifying trees that are significantly different from the rest of the trees in the sample.

Statistics in the Billera-Holmes-Vogtmann treespace

This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees, giving a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetics trees.

Barycentric Subspaces and Affine Spans in Manifolds

Barycentric subspaces are implicitly defined as the locus of points which are weighted means of \(k+1\) reference points which contains the Frechet mean and it is shown that this definition defines locally a submanifold of dimension k and that it generalizes in some sense geodesic subspaced.

Clustering Genes of Common Evolutionary History

A large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history finds that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward’s method.