• Corpus ID: 211677585

Tropical Support Vector Machine and its Applications to Phylogenomics

  title={Tropical Support Vector Machine and its Applications to Phylogenomics},
  author={Xiaoxian Tang and Houjie Wang and Ruriko Yoshida},
  journal={arXiv: Combinatorics},
Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic… 

Figures and Tables from this paper

Tropical Support Vector Machines: Evaluations and Extension to Function Spaces
It is shown theoretically by extreme value statistics that the tropical SVMs for classifying data points from two Gaussian distributions as well as empirical data sets of different neuron types are fairly robust against the curse of dimensionality.
Tropical Data Science
This paper surveys some new developments of machine learning models using tropical geometry to analyze a set of phylogenetic trees over a tree space.
Tropical Geometric Variation of Phylogenetic Tree Shapes
We study the behavior of phylogenetic tree shapes in the tropical geometric interpretation of tree space. Tree shapes are formally referred to as tree topologies; a tree topology can also be thought
Tree Topologies along a Tropical Line Segment
This paper focuses on combinatorics of tree topologies along a tropical line segment, an intrinsic geodesic with the tropical metric, between two phylogenetic trees over the tree space and it is shown that if two given trees differ only one nearest neighbor interchange (NNI) move, then the tree topology of a tree in the Tropical line segment between them is the same tree topological of one of these given two trees with possible zero branch lengths.
Tropical optimal transport and Wasserstein distances
We study the problem of optimal transport in tropical geometry and define the Wasserstein-p distances in the continuous metric measure space setting of the tropical projective torus. We specify the
Tropical linear regression and mean payoff games: or, how to measure the distance to equilibria
A strong duality theorem is established, showing that the value of the problem of finding the best approximation of a set of points by a tropical hyperplane coincides with the maximal radius of a Hilbert's ball included in a tropical polyhedron.


Tropical Principal Component Analysis and Its Application to Phylogenetics
This work defines and analyzes two analogues of principal component analysis in the setting of tropical geometry and gives approximative algorithms for both approaches and applies them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
A novel framework to study sets of phylogenetic trees based on tropical geometry is proposed and studied, which exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art.
Tropical principal component analysis on the space of ultrametrics
In 2019, Yoshida et al. introduced a notion of tropical principal component analysis (PCA). The output is a tropical polytope with a fixed number of vertices that best fits the data. We here apply
From Gene Trees to Species Trees
This paper studies various algorithmic issues in reconstructing a species tree from gene trees under the duplication and the mutation cost model and proposes a heuristic method that is significantly better than the existing program in Page's GeneTree 1.0 that starts the search from a random tree.
ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R
Efforts have been put to improve efficiency, flexibility, support for 'big data' (R's long vectors), ease of use and quality check before a new release of ape.
Bayesian estimation of concordance among gene trees.
A novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods and introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees.
Convexity in Tree Spaces
The geometry of metrics and convexity structures on the space of phylogenetic trees is studied, which is here realized as the tropical linear space of all ultrametrics and the tropical metric arises from the theory of orthant spaces.
AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics
A simple tool is presented that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths.
Data Mining and Analysis: Fundamental Concepts and Algorithms
This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics.
ggplot2 - Elegant Graphics for Data Analysis
This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data