Corpus ID: 235829643

Optimality of the Johnson-Lindenstrauss Dimensionality Reduction for Practical Measures

@article{Bartal2021OptimalityOT,
  title={Optimality of the Johnson-Lindenstrauss Dimensionality Reduction for Practical Measures},
  author={Yair Bartal and Ora Nova Fandina and Kasper Green Larsen},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.06626}
}
It is well known that the Johnson-Lindenstrauss dimensionality reduction method is optimal for worst case distortion. While in practice many other methods and heuristics are used, not much is known in terms of bounds on their performance. The question of whether the JL method is optimal for practical measures of distortion was recently raised in [BFN19] (NeurIPS’19). They provided upper bounds on its quality for a wide range of practical measures and showed that indeed these are best possible… Expand

References

SHOWING 1-10 OF 32 REFERENCES
Dimensionality reduction: theoretical perspective on practical measures
TLDR
A comprehensive theoretical framework addressing a line of research initiated by VL [NeuroIPS' 18] who have set the goal of analyzing different distortion measurement criteria, with the lens of Machine Learning applicability, from both theoretical and practical perspectives is provided. Expand
Advances in metric embedding theory
TLDR
It is proved that any metric space on n points embeds into Lp with distortion O( log n) in dimension O(log n) which provides an optimal bound on the dimension of the embedding. Expand
Measures of distortion for machine learning
TLDR
It is shown that many of the existing distortion measures behave in an undesired way, when considered from a machine learning point of view, and suggests a new measure of distortion, called $\sigma$-distortion, which satisfies all desirable properties and is a better candidate to evaluate distortion in the context of machine learning. Expand
On metric ramsey-type phenomena
TLDR
This paper states that for any ε>0, every n point metric space contains a subspace of size at least n1-ε which is embeddable in an ultrametric with O(log(1/ε)/ε distortion, which provides a bound for embedding in Euclidean spaces. Expand
Optimal Compression of Approximate Inner Products and Dimension Reduction
  • N. Alon, B. Klartag
  • Mathematics, Computer Science
  • 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2017
TLDR
The proof is algorithmic, and provides an efficient algorithm for computing a sketch of size O(f(n,k,≥)/n) for each point, so that the square of the distance between any two points can be computed from their sketches up to an additive error in time linear in the length of the sketches. Expand
Terminal embeddings
TLDR
This paper studies terminal embeddings, in which one is given a finite metric (X, dX) and a subset K ⊆ X of its points are designated as terminals, and provides an Õ( √ log |K|)-approximation algorithm for sparsest-cut instances in which each demand is incident on one of the vertices of K. Expand
Approximate nearest neighbors: towards removing the curse of dimensionality
TLDR
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d. Expand
Bi-Lipschitz embeddings into low-dimensional Euclidean spaces
Let (X, d), (Y, p) be metric spaces and / : X —• Y an infective mapping. We put l i / I U = s u p { ^ y y ) ; x , y € X,x*y}, d(x,y) and dist(f) = ||/||i,t>||/"~|Ut> (the distortion of the mapping /Expand
Global Optimization in Any Minkowski Metric: A Permutation-Translation Simulated Annealing Algorithm for Multidimensional Scaling
TLDR
The experimental results confirm the theoretical expectation that Simulated Annealing is a suitable strategy to deal by itself with the optimization problems in Multidimensional Scaling, in particular for City-Block, Euclidean and Infinity metrics. Expand
Visualizing Data using t-SNE
We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of StochasticExpand
...
1
2
3
4
...