Corpus ID: 222378340

On Convergence of Nearest Neighbor Classifiers over Feature Transformations

  title={On Convergence of Nearest Neighbor Classifiers over Feature Transformations},
  author={Luka Rimanic and C{\'e}dric Renggli and Bo Li and Ce Zhang},
The k-Nearest Neighbors (kNN) classifier is a fundamental non-parametric machine learning algorithm. However, it is well known that it suffers from the curse of dimensionality, which is why in practice one often applies a kNN classifier on top of a (pre-trained) feature transformation. From a theoretical perspective, most, if not all theoretical results aimed at understanding the kNN classifier are derived for the raw feature space. This leads to an emerging gap between our theoretical… Expand

Figures and Tables from this paper

Evaluating Bayes Error Estimators on Read-World Datasets with FeeBee
FeeBee is proposed, the first principled framework for analyzing and comparing BER estimators on any modern real-world dataset with unknown probability distribution, and allows a thorough study of these estimators, clearly identifying strengths and weaknesses of each, whilst being easily deployable on any future BERN estimator. Expand
On Automatic Feasibility Study for Machine Learning Application Development with
This paper proposes practical "compromises" that enable the application of Bayes error estimators and develops an evaluation framework that compares different estimators empirically on real-world data. Expand
A Data Quality-Driven View of MLOps
By performing a joint analysis of the impact of well-known data quality dimensions and the downstream machine learning process, it is shown that different components of a typical MLOps pipeline can be efficiently designed, providing both a technical and theoretical perspective. Expand


Asymptotic Slowing Down of the Nearest-Neighbor Classifier
If patterns are drawn from an n-dimensional feature space according to a probability distribution that obeys a weak smoothness criterion, we show that the probability that a random input pattern isExpand
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
The DkNN algorithm is evaluated on several datasets, and it is shown the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures. Expand
Improving Generalization via Scalable Neighborhood Component Analysis
This work uses a deep neural network to learn the visual feature that preserves the neighborhood structure in the semantic space, based on the Neighborhood Component Analysis (NCA) criterion and devise a mechanism to use augmented memory to scale NCA for large datasets and very deep networks. Expand
Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
This work introduces a theoretical framework analogous to bias-variance theory for understanding why adversarial examples arise, and uses this framework to analyze the robustness of a canonical non-parametric classifier - the k-nearest neighbors. Expand
Rates of Convergence for Nearest Neighbor Classification
This work analyzes the behavior of nearest neighbor classification in metric spaces and provides finite-sample, distribution-dependent rates of convergence under minimal assumptions, and finds that under the Tsybakov margin condition the convergence rate of nearest neighbors matches recently established lower bounds for nonparametric classification. Expand
k-Nearest Neighbor Augmented Neural Networks for Text Classification
This work proposes to enhance neural network models by allowing them to leverage information from $k-nearest neighbor (kNN) of the input text by employing a neural network that encodes texts into text embeddings and utilizing it to capture instance-level information from the training set. Expand
Convergence of the nearest neighbor rule
  • T. Wagner
  • Mathematics, Computer Science
  • IEEE Trans. Inf. Theory
  • 1971
It is shown that when the samples lie in n -dimensional Euclidean space, the probability of error for the NNR conditioned on the n known samples converges to R with probability 1 for mild continuity and moment assumptions on the class densities. Expand
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
It is shown that the proposed CPClean approach built based on CP can often significantly outperform existing techniques in terms of classification accuracy with mild manual cleaning effort. Expand
Neighbourhood Components Analysis
A novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm that directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. Expand
Nearest neighbor pattern classification
The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor. Expand