Bringing UMAP Closer to the Speed of Light with GPU Acceleration

  title={Bringing UMAP Closer to the Speed of Light with GPU Acceleration},
  author={Corey J. Nolet and Victor Lafargue and Edward Raff and Thejaswi Nanditale and Tim Oates and John Zedlewski and Joshua Patterson},
The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning. While many algorithms can be ported to a GPU in a simple and direct fashion, such efforts have resulted in inefficent and inaccurate versions of UMAP. We show a number of techniques that can be used to make a faster and more faithful GPU version of UMAP, and obtain speedups of up… 

Getting Passive Aggressive About False Positives: Patching Deployed Malware Detectors

False positives (FPs) have been an issue of extreme importance for anti-virus (AV) systems for decades. As more security vendors turn to machine learning, alert deluge has hit critical mass with over

COVID-19 Kaggle Literature Organization

This work describes an approach to organize and visualize the scientific literature on or related to COVID-19 using machine learning techniques so that papers on similar topics are grouped together and the navigation of topics and related papers is simplified.

GiDR-DUN; Gradient Dimensionality Reduction - Differences and Unification

GDR, thatcombinespreviously incompatibletechniques fromTSNEandUMAP and canreplicatethe result sofeitheralgorithm by changingthenormalization, is implemented.

Accelerating single-cell genomic analysis with GPUs

The use of RAPIDS and GPU computing to accelerate single-cell genomic analysis workflows is reported and open-source examples that can be reused by the community are presented.

Visinity: Visual Spatial Neighborhood Analysis for Multiplexed Tissue Imaging Data

Visinity, a scalable visual analytics system to analyze cell interaction patterns across cohorts of whole-slide multiplexed tissue images, is presented, based on a fast regional neighborhood computation, leveraging unsupervised learning to quantify, compare, and group cells by their surrounding cellular neighborhood.

High Mass Resolution fs-LIMS Imaging and Manifold Learning Reveal Insight Into Chemical Diversity of the 1.88 Ga Gunflint Chert

Extraction of useful information from unstructured, large and complex mass spectrometric signals is a challenge in many application fields of mass spectrometry. Therefore, new data analysis

Uniform Manifold Approximation with Two-phase Optimization

Through quantitative experiments, it is found that UMATO outperformed widely used DR techniques in preserving the global structure while pro-ducing competitive accuracy in representing the local structure and is preferable in terms of robustness over diverse initialization methods, number of epochs, and subsampling techniques.

Tracking Discourse Influence in Darknet Forums

The main contribution is a joint visualisation of semantic and temporal features, generating insight into the supplied data on darknet cybercrime through the aspects of novelty, transience, and resonance, which describe the potential impact a message might have on the overall discourse in darknet communities.

Scalable semi-supervised dimensionality reduction with GPU-accelerated EmbedSOM

This paper describes an e-cient, highly parallel GPU implementation of EmbedSOM designed to provide interactive results on large datasets, and presents BlosSOM, a high-performance semi-supervised dimensionality reduction so-called for interactive user-steerable visualization of high-dimensional datasets with millions of individual data points.

GPU Semiring Primitives for Sparse Neighborhood Methods

This is the first work aiming to unify the computation of several critical distance measures on the GPU under a single flexible design paradigm and it is shown that this primitive is a foundational component for enabling many neighborhood-based information retrieval and machine learning algorithms to accept sparse input.



UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.

Billion-Scale Similarity Search with GPUs

This paper proposes a novel design for an inline-formula that enables the construction of a high accuracy, brute-force, approximate and compressed-domain search based on product quantization, and applies it in different similarity search scenarios.

Linear tSNE optimization for the Web

This work proposes to approximate the repulsion forces between data points using adaptive-resolution textures that are drawn at every iteration with WebGL to reformulate the tSNE minimization problem as a series of tensor operation that are computed with TensorFlow.js, a JavaScript library for scalable tensor computations.

Scaling Up Stochastic Dual Coordinate Ascent

An asynchronous parallel version of the SDCA algorithm is introduced, its convergence properties are analyzed, and a solution for primal-dual synchronization required to achieve convergence in practice is proposed.

Efficient Large-Scale Approximate Nearest Neighbor Search on the GPU

This work proposes a two level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal and includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values.

GPGPU Linear Complexity t-SNE Optimization

This work presents a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity, and proposes to approximate the repulsive forces between data points by splatting kernel textures for each data point.

T-SNE-CUDA: GPU-Accelerated T-SNE and its Applications to Modern Data

T-SNE-CUDA is introduced, a GPU-accelerated implementation of t-distributed Symmetric Neighbour Embedding for visualizing datasets and models and significantly outperforms current implementations with 50-700x speedups on the CIFAR-10 and MNIST datasets.

Linear Models with Many Cores and CPUs: A Stochastic Atomic Update Scheme

A Stochastic Atomic Update Scheme (SAUS) for training linear models on many core machines is proposed, simple to implement, reduces the number of divergent cases, and obtains greater speedups by being able to effectively use an 80-core server.

CuPy : A NumPy-Compatible Library for NVIDIA GPU Calculations

It is accelerated with the CUDA platform from NVIDIA and also uses CUDA-related libraries, including cuBLAS, cuDNN, cuRAND, cuSOLVER, cuSPARSE, and NCCL, to make full use of the GPU architecture.

PyTorch: An Imperative Style, High-Performance Deep Learning Library

This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.