• Corpus ID: 221970327

Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning

@article{Sainburg2020ParametricUL,
  title={Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning},
  author={Tim Sainburg and Leland McInnes and Timothy Q. Gentner},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.12981}
}
We propose Parametric UMAP, a parametric variation of the UMAP (Uniform Manifold Approximation and Projection) algorithm. UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional… 

Uniform Manifold Approximation and Projection (UMAP) and its Variants: Tutorial and Survey

This is a tutorial and survey paper on UMAP and its variants where UMAP algorithm is explained, probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and supervised and semisupervised embedding by UMAP.

A Parametric UMAP’s sampling and effective loss function

  • Computer Science
  • 2022
In Parametric UMAP [17] the embeddings are not directly optimized, instead a parametric function, a neural network, is trained to map the input points to embedding space via stochastic gradient descent.

Using Genetic Programming to Find Functional Mappings for UMAP Embeddings

This work proposes utilising UMAP to create functional mappings with genetic programming-based manifold learning and compares two different approaches: one that uses the embedding produced by UMAP as the target for the functional mapping; and the other which directly optimises the UMAP cost function by using it as the fitness function.

Panoramic Manifold Projection (Panoramap) for Single-Cell Data Dimensionality Reduction and Visualization

It is shown that Panoramap excels at delineating the cell type lineage/hierarchy and can reveal rare cell types and has the potential to aid in the early diagnosis of tumors.

UMAP does not reproduce high-dimensional similarities due to negative sampling

This work derives UMAP’s effective loss function in closed form and finds that it differs from the published one, and shows that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities, and tries to reproduce similarities that only encode the shared k nearest neighbor graph.

On UMAP's true loss function

It is shown that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities, instead it tries to reproduce similarities that only encode the k nearest neighbor graph, thereby challenging the previous understanding of UMAP’s effectiveness.

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

The authors used 4 types of datasets: Mouse retina transcriptomes, Fashion MNIST, EMNIST Letters, FashionMNIST, and CIFAR-10, which consist of PCA projections of single-cell transcriptome data collected from mouse retina and real-world images in 10 classes.

DeepVisualInsight: Time-Travelling Visualization for Spatio-Temporal Causality of Deep Classification Training

A time-travelling visual solution aimed at manifesting the spatio-temporal causality while training a deep learning image classifier, which can well reflect the characteristics of various training scenarios, showing good potential of DVI as a debugging tool for analyzing deep learning training processes.

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

This study proposes Message Passing Adaptive Resonance Theory (MPART), a model that infers the class of unlabeled data and selects informative and representative samples through message passing between nodes on the topological graph and significantly outperforms the competitive models in online active learning environments.

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology

A novel machine learning framework is introduced to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens, showing significant agreement between densities estimated by the EUNet model and by trained pathologists, and highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process.

References

SHOWING 1-10 OF 51 REFERENCES

DIMAL: Deep Isometric Manifold Learning Using Sparse Geodesic Sampling

This paper uses the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances and shows a significantly improved local and nonlocal generalization of the isometric mapping.

Extendable and invertible manifold learning with geometry regularized autoencoders

This work presents a new method for integrating both approaches to manifold learning and autoencoders by incorporating a geometric regularization term in the bottleneck of the autoencoder, based on the diffusion potential distances from the recently-proposed PHATE visualization method.

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.

VAE-SNE: a deep generative model for simultaneous dimensionality reduction and clustering

It is found that VAE-SNE produces high-quality compressed representations with results that are on par with existing nonlinear dimensionality reduction algorithms, and can be used for unsupervised action recognition to detect and classify repeated motifs of stereotyped behavior in high-dimensional timeseries data.

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.

Connectivity-Optimized Representation Learning via Persistent Homology

This work controls the connectivity of an autoencoder's latent space via a novel type of loss, operating on information from persistent homology, which is differentiable and presents a theoretical analysis of the properties induced by the loss.

Learning a Parametric Embedding by Preserving Local Structure

The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space, and evaluates the performance in experiments on three datasets.

Auto-Encoding Variational Bayes

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
...