• Corpus ID: 246431117

A Probabilistic Graph Coupling View of Dimension Reduction

  title={A Probabilistic Graph Coupling View of Dimension Reduction},
  author={Hugues van Assel and Thibault Espinasse and Julien Chiquet and Franck Picard},
Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in… 

Figures and Tables from this paper



UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.

Visualizing Large-scale and High-dimensional Data

The LargeVis is proposed, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space and easily scales to millions of high-dimensional data points.

Visualizing high-dimensional data using t-sne

  • Journal of Machine Learning Research,
  • 2008

Gaussian Markov Random Fields: Theory and Applications

This volume is essential reading for statisticians working in spatial theory and its applications, as well as quantitative researchers in a wide range of science fields where spatial data analysis is important.

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

This work proposes a geometrically motivated algorithm for representing the high-dimensional data that provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering.

Graph Laplacians and their Convergence on Random Neighborhood Graphs

This paper determines the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero and shows that for a uniform measure on the submanifold all graph LaPLacians have the same limit up to constants.

Visualizing highdimensional data using t-sne

  • Journal of Machine Learning Research,
  • 2008

Initialization is critical for preserving global data structure in both t-SNE and UMAP.

It is argued that there is currently no evidence that the UMAP algorithm per se has any advantage over t-SNE in terms of preserving global structure, and it is contended that these algorithms should always use informative initialization by default.

Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization

This work provides several unexpected insights into what design choices to make and avoid when constructing DR algorithms, and designs a new algorithm, called Pairwise Controlled Manifold Approximation Projection (PaCMAP), which preserves both local and global structure.