Topological Methods for Unsupervised Learning

  title={Topological Methods for Unsupervised Learning},
  author={Leland McInnes},
Unsupervised learning is a broad topic in machine learning with many diverse sub-disciplines. Within the field of unsupervised learning we will consider three major topics: dimension reduction; clustering; and anomaly detection. We seek to use the languages of topology and category theory to provide a unified mathematical approach to these three major problems in unsupervised learning. 
Functorial Clustering via Simplicial Complexes
We adapt previous research on topological unsupervised learning to characterize a class of hierarchical overlapping clustering algorithms as functors that factor through a category of simplicialExpand
Category Theory in Machine Learning
This work aims to document the motivations, goals and common themes across these applications of category theory in machine learning, touching on gradient-based learning, probability, and equivariant learning. Expand


Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample
  • W. Stuetzle
  • Mathematics, Computer Science
  • J. Classif.
  • 2003
In this work,unt pruning, a new clustering method that attempts to find modes of a density by analyzing the minimal spanning tree of a sample by exploiting the connection between the minimal spans tree and nearest neighbor density is introduced. Expand
Rates of convergence for the cluster tree
Finite-sample convergence rates for the algorithm and lower bounds on the sample complexity of this estimation problem are given. Expand
Density-Based Clustering Based on Hierarchical Density Estimates
This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure. Expand
Accelerated Hierarchical Density Based Clustering
  • Leland McInnes, John Healy
  • Mathematics, Computer Science
  • 2017 IEEE International Conference on Data Mining Workshops (ICDMW)
  • 2017
The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter epsilon, making it the default choice for density based clustering. Expand
Classifying Clustering Schemes
A framework is constructed for studying what happens when one imposes various structural conditions on the clustering schemes, under the general heading of functoriality, and it is shown that, within this framework, one can prove a theorem analogous to one of Kleinberg (Becker et al). Expand
Shift-invariant similarities circumvent distance concentration in stochastic neighbor embedding and variants
Why the phenonomenon of distance concentration is an impediment towards effcient dimensionality reduction and how SNE and its variants circumvent this diffculty by using similarities that are invariant to shifts with respect to squared distances are detailed. Expand
Consistency of Single Linkage for High-Density Clusters
Abstract High-density clusters are defined on a population with density f in r dimensions to be the maximal connected sets of form {x | f(x) ≥ c}. Single-linkage clustering is evaluated forExpand
UMAP: Uniform Manifold Approximation and Projection
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.Expand
Simplicial Homotopy Theory
Simplicial sets, model categories, and cosimplicial spaces: applications for homotopy coherence, results and constructions, and more. Expand
The relation between the categories of Fuzzy Sets and that of Sheaves is explored and the precise connection between them is expli­ cated. In particular, it is shown that if the notion of fuzzy setsExpand