Tropical Support Vector Machine and its Applications to Phylogenomics
@article{Tang2020TropicalSV, title={Tropical Support Vector Machine and its Applications to Phylogenomics}, author={Xiaoxian Tang and Houjie Wang and Ruriko Yoshida}, journal={arXiv: Combinatorics}, year={2020} }
Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic…
6 Citations
Tropical Support Vector Machines: Evaluations and Extension to Function Spaces
- Computer ScienceArXiv
- 2021
It is shown theoretically by extreme value statistics that the tropical SVMs for classifying data points from two Gaussian distributions as well as empirical data sets of different neuron types are fairly robust against the curse of dimensionality.
Tropical Data Science
- BiologyArXiv
- 2020
This paper surveys some new developments of machine learning models using tropical geometry to analyze a set of phylogenetic trees over a tree space.
Tropical Geometric Variation of Phylogenetic Tree Shapes
- Mathematics
- 2020
We study the behavior of phylogenetic tree shapes in the tropical geometric interpretation of tree space. Tree shapes are formally referred to as tree topologies; a tree topology can also be thought…
Tree Topologies along a Tropical Line Segment
- MathematicsVietnam Journal of Mathematics
- 2022
This paper focuses on combinatorics of tree topologies along a tropical line segment, an intrinsic geodesic with the tropical metric, between two phylogenetic trees over the tree space and it is shown that if two given trees differ only one nearest neighbor interchange (NNI) move, then the tree topology of a tree in the Tropical line segment between them is the same tree topological of one of these given two trees with possible zero branch lengths.
Tropical optimal transport and Wasserstein distances
- MathematicsInformation Geometry
- 2021
We study the problem of optimal transport in tropical geometry and define the Wasserstein-p distances in the continuous metric measure space setting of the tropical projective torus. We specify the…
Tropical linear regression and mean payoff games: or, how to measure the distance to equilibria
- Mathematics, EconomicsArXiv
- 2021
A strong duality theorem is established, showing that the value of the problem of finding the best approximation of a set of points by a tropical hyperplane coincides with the maximal radius of a Hilbert's ball included in a tropical polyhedron.
References
SHOWING 1-10 OF 31 REFERENCES
Tropical Principal Component Analysis and Its Application to Phylogenetics
- BiologyBulletin of mathematical biology
- 2019
This work defines and analyzes two analogues of principal component analysis in the setting of tropical geometry and gives approximative algorithms for both approaches and applies them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
- Biology
- 2018
A novel framework to study sets of phylogenetic trees based on tropical geometry is proposed and studied, which exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art.
Tropical principal component analysis on the space of ultrametrics
- Environmental Science
- 2019
In 2019, Yoshida et al. introduced a notion of tropical principal component analysis (PCA). The output is a tropical polytope with a fixed number of vertices that best fits the data. We here apply…
From Gene Trees to Species Trees
- Computer ScienceSIAM J. Comput.
- 2000
This paper studies various algorithmic issues in reconstructing a species tree from gene trees under the duplication and the mutation cost model and proposes a heuristic method that is significantly better than the existing program in Page's GeneTree 1.0 that starts the search from a random tree.
ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R
- BiologyBioinform.
- 2019
Efforts have been put to improve efficiency, flexibility, support for 'big data' (R's long vectors), ease of use and quality check before a new release of ape.
Bayesian estimation of concordance among gene trees.
- BiologyMolecular biology and evolution
- 2007
A novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods and introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees.
Convexity in Tree Spaces
- MathematicsSIAM J. Discret. Math.
- 2017
The geometry of metrics and convexity structures on the space of phylogenetic trees is studied, which is here realized as the tropical linear space of all ultrametrics and the tropical metric arises from the theory of orthant spaces.
AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics
- BiologyBioinform.
- 2008
A simple tool is presented that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths.
Data Mining and Analysis: Fundamental Concepts and Algorithms
- Computer Science
- 2014
This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics.
ggplot2 - Elegant Graphics for Data Analysis
- Computer ScienceUse R
- 2009
This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data…