# Dimensionality's Blessing: Clustering Images by Underlying Distribution

@article{Lin2018DimensionalitysBC, title={Dimensionality's Blessing: Clustering Images by Underlying Distribution}, author={Wen-Yan Lin and Siying Liu and Jianhuang Lai and Yasuyuki Matsushita}, journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2018}, pages={5784-5793} }

Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing. Re-deriving "contrast-loss" using the law of large numbers, we show it results in a distribution's instances concentrating on a thin "hyper-shell". The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We… Expand

#### Figures, Tables, and Topics from this paper

#### 8 Citations

Deep Discriminative Clustering Analysis

- Computer Science, Mathematics
- ArXiv
- 2019

Deep Discriminative Clustering (DDC) is developed that models the clustering task by investigating relationships between patterns with a deep neural network and outputs a group of discriminative representations that can be treated as clustering centers for straightway clustering. Expand

Hierarchical Models: Intrinsic Separability in High Dimensions

- Computer Science
- ArXiv
- 2020

It is demonstrated how the model implies high dimensional data posses an innate separability that can be exploited for machine learning, leading to qualitative and quantitative improvements in performance. Expand

Ensemble learning of high dimension datasets

- Computer Science
- 2020

This thesis proposes a method of building ensembles for Deep Neural Network image classifications using RS projections without needing to retrain the neural network, which showed improved accuracy and very good robustness to adversarial examples. Expand

SOSNet: Second Order Similarity Regularization for Local Descriptor Learning

- Computer Science
- 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

This work proposes a novel regularization term, named Second Order Similarity Regularization (SOSR), that follows the intuition that a positive pair of matching points should exhibit similar distances with respect to other points in the embedding space and demonstrates that SOSR can significantly boost the matching performance of the learned descriptor. Expand

Deep Unsupervised Anomaly Detection

- Computer Science
- 2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2021

Experimental results on several public benchmark datasets show that the proposed method outperforms state-of-the-art unsupervised techniques and is comparable to semi- supervised techniques in most cases. Expand

GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence

- Computer Science
- International Journal of Computer Vision
- 2019

GMS is proposed, which incorporates the smoothness constraint into a statistic framework for separation and uses a grid-based implementation for fast calculation and integrates into the well-known ORB-SLAM system for monocular initialization, resulting in a significant improvement. Expand

GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence

- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017

GMS (Grid-based Motion Statistics), a simple means of encapsulating motion smoothness as the statistical likelihood of a certain number of matches in a region, enables translation of high match numbers into high match quality. Expand

Multiview Feature Selection for Single-View Classification

- Computer Science, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021

A multiview feature selection method that leverages the knowledge of all views and use it to guide the feature selection process in an individual view is presented and has improved the classification error rate by 31% of the error rate of the state-of-the-art. Expand

#### References

SHOWING 1-10 OF 49 REFERENCES

The Role of Hubness in Clustering High-Dimensional Data

- Mathematics, Computer Science
- IEEE Transactions on Knowledge and Data Engineering
- 2014

This paper shows that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points, can be successfully exploited in clustering, and proposes several hubness-based clustering algorithms. Expand

When Is ''Nearest Neighbor'' Meaningful?

- Computer Science
- ICDT
- 1999

The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point. Expand

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

- Computer Science
- IEEE Transactions on Knowledge and Data Engineering
- 2015

Evidence is provided supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings, and novel insight is offered into the usefulness of reverse neighbor counts in unsupervised outlier detection. Expand

Fast algorithms for projected clustering

- Computer Science
- SIGMOD '99
- 1999

An algorithmic framework for solving the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves, is developed and tested. Expand

An Efficient k-Means Clustering Algorithm: Analysis and Implementation

- Computer Science
- IEEE Trans. Pattern Anal. Mach. Intell.
- 2002

This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time. Expand

Subspace clustering for high dimensional data: a review

- Computer Science
- SKDD
- 2004

A survey of the various subspace clustering algorithms along with a hierarchy organizing the algorithms by their defining characteristics is presented, comparing the two main approaches using empirical scalability and accuracy tests and discussing some potential applications where sub space clustering could be particularly useful. Expand

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

- Computer Science
- ICDT
- 2001

This paper examines the behavior of the commonly used L k norm and shows that the problem of meaningfulness in high dimensionality is sensitive to the value of k, which means that the Manhattan distance metric is consistently more preferable than the Euclidean distance metric for high dimensional data mining applications. Expand

Sparse Subspace Clustering: Algorithm, Theory, and Applications

- Computer Science, Mathematics
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2013

This paper proposes and studies an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces, and demonstrates the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. Expand

Self-Tuning Spectral Clustering

- Computer Science, Mathematics
- NIPS
- 2004

This work proposes that a 'local' scale should be used to compute the affinity between each pair of points and suggests exploiting the structure of the eigenvectors to infer automatically the number of groups. Expand

k-means projective clustering

- Mathematics, Computer Science
- PODS '04
- 2004

An extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces is presented, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error. Expand