Dimensionality's Blessing: Clustering Images by Underlying Distribution

  title={Dimensionality's Blessing: Clustering Images by Underlying Distribution},
  author={Wen-Yan Lin and Siying Liu and Jianhuang Lai and Yasuyuki Matsushita},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing. Re-deriving "contrast-loss" using the law of large numbers, we show it results in a distribution's instances concentrating on a thin "hyper-shell". The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We… Expand
Deep Discriminative Clustering Analysis
Deep Discriminative Clustering (DDC) is developed that models the clustering task by investigating relationships between patterns with a deep neural network and outputs a group of discriminative representations that can be treated as clustering centers for straightway clustering. Expand
Hierarchical Models: Intrinsic Separability in High Dimensions
It is demonstrated how the model implies high dimensional data posses an innate separability that can be exploited for machine learning, leading to qualitative and quantitative improvements in performance. Expand
Ensemble learning of high dimension datasets
This thesis proposes a method of building ensembles for Deep Neural Network image classifications using RS projections without needing to retrain the neural network, which showed improved accuracy and very good robustness to adversarial examples. Expand
SOSNet: Second Order Similarity Regularization for Local Descriptor Learning
This work proposes a novel regularization term, named Second Order Similarity Regularization (SOSR), that follows the intuition that a positive pair of matching points should exhibit similar distances with respect to other points in the embedding space and demonstrates that SOSR can significantly boost the matching performance of the learned descriptor. Expand
Deep Unsupervised Anomaly Detection
Experimental results on several public benchmark datasets show that the proposed method outperforms state-of-the-art unsupervised techniques and is comparable to semi- supervised techniques in most cases. Expand
GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence
GMS is proposed, which incorporates the smoothness constraint into a statistic framework for separation and uses a grid-based implementation for fast calculation and integrates into the well-known ORB-SLAM system for monocular initialization, resulting in a significant improvement. Expand
GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence
GMS (Grid-based Motion Statistics), a simple means of encapsulating motion smoothness as the statistical likelihood of a certain number of matches in a region, enables translation of high match numbers into high match quality. Expand
Multiview Feature Selection for Single-View Classification
A multiview feature selection method that leverages the knowledge of all views and use it to guide the feature selection process in an individual view is presented and has improved the classification error rate by 31% of the error rate of the state-of-the-art. Expand


The Role of Hubness in Clustering High-Dimensional Data
This paper shows that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points, can be successfully exploited in clustering, and proposes several hubness-based clustering algorithms. Expand
When Is ''Nearest Neighbor'' Meaningful?
The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point. Expand
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Evidence is provided supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings, and novel insight is offered into the usefulness of reverse neighbor counts in unsupervised outlier detection. Expand
Fast algorithms for projected clustering
An algorithmic framework for solving the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves, is developed and tested. Expand
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time. Expand
Subspace clustering for high dimensional data: a review
A survey of the various subspace clustering algorithms along with a hierarchy organizing the algorithms by their defining characteristics is presented, comparing the two main approaches using empirical scalability and accuracy tests and discussing some potential applications where sub space clustering could be particularly useful. Expand
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
This paper examines the behavior of the commonly used L k norm and shows that the problem of meaningfulness in high dimensionality is sensitive to the value of k, which means that the Manhattan distance metric is consistently more preferable than the Euclidean distance metric for high dimensional data mining applications. Expand
Sparse Subspace Clustering: Algorithm, Theory, and Applications
  • Ehsan Elhamifar, R. Vidal
  • Computer Science, Mathematics
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2013
This paper proposes and studies an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces, and demonstrates the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. Expand
Self-Tuning Spectral Clustering
This work proposes that a 'local' scale should be used to compute the affinity between each pair of points and suggests exploiting the structure of the eigenvectors to infer automatically the number of groups. Expand
k-means projective clustering
An extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces is presented, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error. Expand