Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

@article{Nye2016PrincipalCA,
  title={Principal component analysis and the locus of the Fr{\'e}chet mean in the space of phylogenetic trees},
  author={Tom M. W. Nye and Xiaoxian Tang and Grady Weyenberg and Ruriko Yoshida},
  journal={Biometrika},
  year={2016},
  volume={104},
  pages={901 - 922}
}
Summary Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi‐dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high‐dimensional data to a low‐dimensional representation that preserves much of the sample's structure… 

Figures and Tables from this paper

Tropical principal component analysis on the space of phylogenetic trees

This work develops a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo (MCMC) approach that performs well with simulation studies, and it is applied to three empirical datasets.

Foundations of the Wald Space for Phylogenetic Trees

. Evolutionary relationships between species are represented by phylogenetic trees, but these relationships are subject to uncertainty due to the random nature of evolution. A geometry for the space

Tropical Principal Component Analysis and Its Application to Phylogenetics

This work defines and analyzes two analogues of principal component analysis in the setting of tropical geometry and gives approximative algorithms for both approaches and applies them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.

Tropical Principal Component Analysis and Its Application to Phylogenetics

This work defines and analyzes two analogues of principal component analysis in the setting of tropical geometry and gives approximative algorithms for both approaches and applies them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.

Information geometry for phylogenetic trees

A gradient descent algorithm is derived to project from the ambient space of covariance matrices to wald space and it is shown numerically that the two information geometries (discrete and continuous) are very similar.

Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective

A novel framework to study sets of phylogenetic trees based on tropical geometry is proposed and studied, which exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art.

Confidence Sets for Phylogenetic Trees

  • A. Willis
  • Biology
    Journal of the American Statistical Association
  • 2018
This manuscript unify recent computational and probabilistic advances to construct tree–valued confidence sets, identifying the best supported most recent ancestor of the Zika virus, and formally testing the hypothesis that a Floridian dentist with AIDS infected two of his patients with HIV.

Confidence procedures for phylogenetic trees

The inferential method is a confidence set for the Fréchet mean of a distribution with support on the metric space of phylogenetic trees, and two exploratory methods are proposed for visualizing collections of trees, which rely on similar tools to the confidence set procedure.

The space of equidistant phylogenetic cactuses

It is shown that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points, and an encoding of ranked, rooted X-trees in terms of partitions of X provides an alternative proof that the space of ultrametric trees on X is CAT( 0).

Statistics for Data with Geometric Structure

Statistics for data with geometric structure is an active and diverse topic of research. Applications include manifold spaces in directional data or symmetric positive definite matrices and some

References

SHOWING 1-10 OF 48 REFERENCES

Principal components analysis in the space of phylogenetic trees

A novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA is described, illustrated by application to simulated sets of trees and to a set of gene trees from metazoan (animal) species.

An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace

  • T. M. Nye
  • Environmental Science
    IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • 2014
A stochastic algorithm for constructing a principal geodesic or line through treespace which is analogous to the first principal component in standard principal components analysis, though convergence to locally optimal geodesics is possible.

Geometry of the Space of Phylogenetic Trees

We consider a continuous space which models the set of all phylogenetic trees having a fixed set of leaves. This space has a natural metric of nonpositive curvature, giving a way of measuring

Polyhedral computational geometry for averaging metric phylogenetic trees

Central limit theorems for Fréchet means in the space of phylogenetic trees

This paper studies the characterisation, and the limiting distributions, of Frechet means in the space of phylogenetic trees. This space is topologically stratified, as well as being a CAT(0) space.

Tree-Space Statistics and Approximations for Large-Scale Analysis of Anatomical Trees

This paper takes advantage of a very large dataset (N=8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well and illustrates how the structure and geometry of airway trees vary across a population.

Analysis and visualization of tree space.

The use of multidimensional scaling of tree-to-tree pairwise distances to visualize the relationships among sets of phylogenetic trees is explored and found to be useful for exploring "tree islands", for comparing sets of trees obtained from bootstrapping and Bayesian sampling, and for comparing multiple Bayesian analyses.

A Fast Algorithm for Computing Geodesic Distances in Tree Space

An important open problem is to find a polynomial time algorithm for finding geodesics in tree space, which starts with a simple initial path and moves through a series of successively shorter paths until the geodesic is attained.

Barycentric subspace analysis on manifolds

  • X. Pennec
  • Mathematics
    The Annals of Statistics
  • 2018
This paper investigates the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. We first propose a new and more general type of family of subspaces in manifolds that we call

Kdetrees: Non-parametric Estimation of Phylogenetic Tree Distributions

Kdetrees, a non-parametric method for estimating tree distributions and identifying outlying trees, is proposed and implemented and implemented, with the goal of identifying trees that are significantly different from the rest of the trees in the sample.