# Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning

@article{Sainburg2020ParametricUL, title={Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning}, author={Tim Sainburg and Leland McInnes and Timothy Q. Gentner}, journal={ArXiv}, year={2020}, volume={abs/2009.12981} }

We propose Parametric UMAP, a parametric variation of the UMAP (Uniform Manifold Approximation and Projection) algorithm. UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional… Expand

#### Figures and Tables from this paper

#### 12 Citations

Uniform Manifold Approximation and Projection (UMAP) and its Variants: Tutorial and Survey

- Computer Science
- ArXiv
- 2021

This is a tutorial and survey paper on UMAP and its variants where UMAP algorithm is explained, probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and supervised and semisupervised embedding by UMAP. Expand

Using Genetic Programming to Find Functional Mappings for UMAP Embeddings

- Computer Science
- 2021 IEEE Congress on Evolutionary Computation (CEC)
- 2021

This work proposes utilising UMAP to create functional mappings with genetic programming-based manifold learning and compares two different approaches: one that uses the embedding produced by UMAP as the target for the functional mapping; and the other which directly optimises the UMAP cost function by using it as the fitness function. Expand

UMAP does not reproduce high-dimensional similarities due to negative sampling

- Computer Science
- ArXiv
- 2021

This work derives UMAP’s effective loss function in closed form and finds that it differs from the published one, and shows that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities, and tries to reproduce similarities that only encode the shared k nearest neighbor graph. Expand

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

- 2021

We used 4 types of datasets: Mouse retina transcriptomes (Macosko et al., 2015; Poličar et al., 2019), Fashion MNIST (Xiao et al., 2017), EMNIST Letters (Cohen et al., 2017), and CIFAR-10 (Krizhevsky… Expand

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology

- Medicine
- International journal of molecular sciences
- 2021

A novel machine learning framework is introduced to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens, showing significant agreement between densities estimated by the EUNet model and by trained pathologists, and highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. Expand

Non-linear Dimensionality Reduction on Extracellular Waveforms Reveals Physiological, Functional, and Laminar Diversity in Premotor Cortex

- 2021

Cortical circuits involved in decision-making are thought to contain a large number of cell types— each with different physiological, functional, and laminar distribution properties—that coordinate… Expand

Non-linear dimensionality reduction on extracellular waveforms reveals cell type diversity in premotor cortex

- Medicine
- eLife
- 2021

This work develops a new method (WaveMAP) that combines non-linear dimensionality reduction with graph clustering to identify putative cell types and robustly establishes eight waveform clusters that recapitulate previously identified narrow- and broad-spiking types while revealing previously unknown diversity within these subtypes. Expand

Non-linear Dimensionality Reduction on Extracellular Waveforms Reveals Cell Type Diversity in Premotor Cortex

- Biology
- 2021

It is shown that non-linear dimensionality reduction with graph clustering applied to the entire extracellular waveform can delineate many different putative cell types and does so in an interpretable manner. Expand

Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections

- Medicine
- Frontiers in Genetics
- 2021

This review details the steps of a typical temporal and spatial analysis, including data pre-processing steps, different statistical and machine learning approaches, validation, interpretation, and the extraction of biological information from mass spectrometry data. Expand

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

- Computer Science
- ICML
- 2021

This study proposes Message Passing Adaptive Resonance Theory (MPART), a model that infers the class of unlabeled data and selects informative and representative samples through message passing between nodes on the topological graph and significantly outperforms the competitive models in online active learning environments. Expand

#### References

SHOWING 1-10 OF 51 REFERENCES

DIMAL: Deep Isometric Manifold Learning Using Sparse Geodesic Sampling

- Computer Science
- 2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2019

This paper uses the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances and shows a significantly improved local and nonlocal generalization of the isometric mapping. Expand

Laplacian Auto-Encoders: An explicit learning of nonlinear data manifold

- Mathematics, Computer Science
- Neurocomputing
- 2015

This paper proposes a novel unsupervised manifold learning method termed Laplacian Auto-Encoders (LAEs), which regularizes training of auto-encoders so that the learned encoding function has the locality-preserving property for data points on the manifold. Expand

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

- Computer Science
- NIPS
- 2016

An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network. Expand

VAE-SNE: a deep generative model for simultaneous dimensionality reduction and clustering

- Biology, Computer Science
- 2020

It is found that VAE-SNE produces high-quality compressed representations with results that are on par with existing nonlinear dimensionality reduction algorithms, and can be used for unsupervised action recognition to detect and classify repeated motifs of stereotyped behavior in high-dimensional timeseries data. Expand

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

- Computer Science, Mathematics
- ArXiv
- 2018

The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Expand

Connectivity-Optimized Representation Learning via Persistent Homology

- Computer Science, Mathematics
- ICML
- 2019

This work controls the connectivity of an autoencoder's latent space via a novel type of loss, operating on information from persistent homology, which is differentiable and presents a theoretical analysis of the properties induced by the loss. Expand

Learning a Parametric Embedding by Preserving Local Structure

- Computer Science, Mathematics
- AISTATS
- 2009

The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space, and evaluates the performance in experiments on three datasets. Expand

Auto-Encoding Variational Bayes

- Mathematics, Computer Science
- ICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand

Parametric nonlinear dimensionality reduction using kernel t-SNE

- Mathematics, Computer Science
- Neurocomputing
- 2015

This work proposes an efficient extension of t-SNE to a parametric framework, kernel t-sNE, which preserves the flexibility of basic t- SNE, but enables explicit out-of-sample extensions and demonstrates that this technique yields satisfactory results also for large data sets. Expand

Adam: A Method for Stochastic Optimization

- Computer Science, Mathematics
- ICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand