Random Projection Trees for Vector Quantization

@article{Dasgupta2009RandomPT,
  title={Random Projection Trees for Vector Quantization},
  author={Sanjoy Dasgupta and Yoav Freund},
  journal={IEEE Transactions on Information Theory},
  year={2009},
  volume={55},
  pages={3229-3242}
}
A simple and computationally efficient scheme for tree-structured vector quantization is presented. Unlike previous methods, its quantization error depends only on the intrinsic dimension of the data distribution, rather than the apparent dimension of the space in which the data happen to lie. 

Figures from this paper

Dimensionality Reduction for k-means Clustering
TLDR
Four algorithms are presented, two feature selection and two feature extraction based algorithms, all of which are randomized, on how to effectively reduce the dimensions of the k-means clustering problem.
Hopfield Networks for Vector Quantization
TLDR
This work considers the problem of finding representative prototypes within a set of data and solves it using Hopfield networks to minimize the mean discrepancy between kernel density estimates of the distributions of data points and prototypes to suggest that vector quantization can be accomplished via adiabatic quantum computing.
Extremely Fast Unsupervised Codebook Learning for Landmark Recognition
TLDR
This paper introduces a fast unsupervised codebook learning - Extremely Random Projection Forest ERPF, which is an ensemble of random projection tree with randomly splitting direction and significantly outperforms other spatial tree methods and k-means.
Vector quantization: a review
  • Zebin Wu, Jun Yu
  • Computer Science
    Frontiers of Information Technology & Electronic Engineering
  • 2019
TLDR
Finding a vector quantization method that can strike a balance between speed and accuracy and consume moderately sized memory, is still a problem requiring study.
Randomized Distribution Feature for Image Classification
TLDR
The proposed randomized distribution features that represent the underlying distribution of local features in each image as a vectorial feature by utilizing random Fourier feature prove the convergences of the similarity and distance based on the randomized distribution feature.
Fast nearest neighbor search through sparse random projections and voting
TLDR
This work proposes a method where multiple random projection trees are combined by a novel voting scheme to exploit the redundancy in a large number of candidate sets obtained by independently generated random projections in order to reduce the number of expensive exact distance evaluations.
Hierarchical Clustering with Performance Guarantees
TLDR
Two new algorithms for hierarchical clustering are described, one that is anAlternative to complete linkage, and the other an alternative to the k-d tree, shown to admit stronger performance guarantees than the classical scheme it replaces.
Fast k-NN search
TLDR
This work proposes a method where multiple random projection trees are combined by a novel voting scheme to exploit the redundancy in a large number of candidate sets obtained by independently generated random projections in order to reduce the number of expensive exact distance evaluations.
Rates of convergence for the cluster tree
TLDR
Finite-sample convergence rates for the algorithm and lower bounds on the sample complexity of this estimation problem are given.
Geodesic Forests
TLDR
Fast-BIC, a fast Bayesian Information Criterion statistic for Gaussian mixture models, is developed and demonstrated that GF is robust to high-dimensional noise, whereas other methods, such as Isomap, UMAP, and FLANN, quickly deteriorate in such settings.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Quantization and the method of k -means
  • D. Pollard
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1982
Asymptotic results from the statistical theory of k -means clustering are applied to problems of vector quantization. The behavior of quantizers constructed from long training sequences of data is
Quantization
TLDR
The key to a successful quantization is the selection of an error criterion – such as entropy and signal-to-noise ratio – and the development of optimal quantizers for this criterion.
Foundations of Quantization for Probability Distributions
General properties of the quantization for probability distributions.- Asymptotic quantization for nonsingular probability distributions.- Asymptotic quantization for singular probability
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
TLDR
This work proposes a geometrically motivated algorithm for representing the high-dimensional data that provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering.
Optimal pruning with applications to tree-structured source coding and modeling
An algorithm introduced by L. Breiman et al. (1984) in the context of classification and regression trees is reinterpreted and extended to cover a variety of applications in source coding and
Nonlinear dimensionality reduction by locally linear embedding.
TLDR
Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
A global geometric framework for nonlinear dimensionality reduction.
TLDR
An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Clustering Large Graphs via the Singular Value Decomposition
TLDR
This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.
Probability: Theory and Examples
This book is an introduction to probability theory covering laws of large numbers, central limit theorems, random walks, martingales, Markov chains, ergodic theorems, and Brownian motion. It is a
Least squares quantization in PCM
  • S. P. Lloyd
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1982
TLDR
The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
...
1
2
3
4
...