• Corpus ID: 5855042

Visualizing Data using t-SNE

  title={Visualizing Data using t-SNE},
  author={Laurens van der Maaten and Geoffrey E. Hinton},
  journal={Journal of Machine Learning Research},
We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many… 

Figures and Tables from this paper

Embedding Neighborhoods Simultaneously t-SNE (ENS-t-SNE)
An algorithm for visualizing a dataset by embedding it in 3-dimensional Euclidean space based on various given distances between the same pairs of datapoints by generalizing the t-Stochastic Neighborhood Embedding approach (ENS-t-SNE).
Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings
The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics, and showed promising results on a real-world dataset.
Using global t-SNE to preserve inter-cluster data structure
It is shown that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global inter-cluster data structure and the tradeoff of λ in representing global structure of data is shown.
Examining Intermediate Data Reduction Algorithms for use with t-SNE
The research shows that no intermediate step in the visualization process is trivial, and application dependent knowledge should be utilized to ensure the best possible visualization in lower dimensional spaces.
q-SNE: Visualizing Data using q-Gaussian Distributed Stochastic Neighbor Embedding
The performance of q-SNE as visualization on 2-dimensional mapping and classification by k-Nearest Neighbors (k-NN) classifier in embedded space compared with SNE, t-S NE, and UMAP is shown by using the datasets MNIST, COIL-20, OlivettiFaces, FashionMNIST, and Glove.
Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
This work introduces a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels and is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality and run-time.
Visualizing Data using GTSNE
The technique is a variation of t-SNE that produces better visualizations by capturing both the local neighborhood structure and the macro structure in the data, particularly for high-dimensional data that lie on continuous low-dimensional manifolds.
An Analysis of the t-SNE Algorithm for Data Visualization
This work gives a formal framework for the problem of data visualization - finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable and gives theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met.
Visualizing High-Dimensional Data Using t-Distributed Stochastic Neighbor Embedding Algorithm
Data visualization is a powerful tool and widely adopted by organizations for its effectiveness to abstract the right information, understand, and interpret results clearly and easily. The real


Nonlinear Dimensionality Reduction
The purpose of the book is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction, which encompasses many of the recently developed methods.
Nonlinear dimensionality reduction of data manifolds with essential loops
Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization
  • S. Lafon, Ann B. Lee
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2006
It is shown that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusionspace and a precise measure of the performance of general clustering algorithms.
Stochastic Neighbor Embedding
This probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images, which allows ambiguous objects, like the document count vector for the word "bank", to have versions close to the images of both "river" and "finance" without forcing the image of outdoor concepts to be located close to those of corporate concepts.
Visualizing Similarity Data with a Mixture of Maps
We show how to visualize a set of pairwise similarities between objects by using several different two-dimensional maps, each of which captures different aspects of the similarity structure. When the
Diffusion maps, spectral clustering and reaction coordinates of dynamical systems
A global geometric framework for nonlinear dimensionality reduction.
An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Learning a kernel matrix for nonlinear dimensionality reduction
This work investigates how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold and shows how to discover a mapping that "unfolds" the underlying manifold from which the data was sampled.
Nonlinear dimensionality reduction by locally linear embedding.
Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.