Wasserstein t-SNE

  title={Wasserstein t-SNE},
  author={Fynn Bachmann and Philipp Hennig and Dmitry Kobak},
. Scientific datasets often have hierarchical structure: for example, in surveys, individual participants (samples) might be grouped at a higher level (units) such as their geographical region. In these settings, the interest is often in exploring the structure on the unit level rather than on the sample level. Units can be compared based on the distance between their means, however this ignores the within-unit distribution of samples. Here we develop an approach for exploratory analysis of… 



Visualizing Data using t-SNE

A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

Wasserstein discriminant analysis

It is shown that WDA leverages a mechanism that induces neighborhood preservation, and the optimization problem of WDA can be tackled using automatic differentiation of Sinkhorn’s fixed-point iterations.

openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding

OpenTSNE is introduced, a modular Python library that implements the core t-SNE algorithm and its extensions and is orders of magnitude faster than existing popular implementations, including those from scikit-learn.

Initialization is critical for preserving global data structure in both t-SNE and UMAP.

It is argued that there is currently no evidence that the UMAP algorithm per se has any advantage over t-SNE in terms of preserving global structure, and it is contended that these algorithms should always use informative initialization by default.

From Louvain to Leiden: guaranteeing well-connected communities

The Leiden algorithm is found to be faster than the Louvain algorithm and uncovers better partitions, in addition to providing explicit guarantees on communities that are guaranteed to be connected.

Gromov–Wasserstein Distances and the Metric Approach to Object Matching

  • F. Mémoli
  • Computer Science
    Found. Comput. Math.
  • 2011
This paper discusses certain modifications of the ideas concerning the Gromov–Hausdorff distance which have the goal of modeling and tackling the practical problems of object matching and comparison by proving explicit lower bounds for the proposed distance that involve many of the invariants previously reported by researchers.

Wasserstein Generative Adversarial Networks

This work introduces a new algorithm named WGAN, an alternative to traditional GAN training that can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches.

Objective Criteria for the Evaluation of Clustering Methods

This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.

The Fréchet distance between multivariate normal distributions