# Comparison-Based Nearest Neighbor Search

@article{Haghiri2017ComparisonBasedNN, title={Comparison-Based Nearest Neighbor Search}, author={Siavash Haghiri and Debarghya Ghoshdastidar and Ulrike von Luxburg}, journal={ArXiv}, year={2017}, volume={abs/1704.01460} }

We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively…

## 26 Citations

Comparison-Based Random Forests

- Computer Science, MathematicsICML
- 2018

A novel random forest algorithm for regression and classification that relies only on triplet comparisons that is efficient both for classification and regression and even competitive with other methods that have direct access to the metric representation of the data.

Scalable and Efficient Comparison-based Search without Features

- Computer ScienceICML
- 2020

A new Bayesian comparison-based search algorithm with noisy answers is proposed; it has low computational complexity yet is efficient in the number of queries and provides theoretical guarantees, deriving the form of the optimal query and proving almost sure convergence to the target object.

K-Nearest Neighbor Approximation Via the Friend-of-a-Friend Principle

- Mathematics
- 2019

Suppose $V$ is an $n$-element set where for each $x \in V$, the elements of $V \setminus \{x\}$ are ranked by their similarity to $x$. The $K$-nearest neighbor graph %$G:=(V, E)$ is a directed graph…

Machine learning in a setting of ordinal distance information

- Mathematics
- 2017

In a typical machine learning scenario we are given numerical dissimilarity values between objects (or feature representations of objects, from which such dissimilarity values can readily be…

Lens Depth Function and k-Relative Neighborhood Graph: Versatile Tools for Ordinal Data Analysis

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2017

This paper proposes algorithms for the problems of medoid estimation, outlier identification, classification, and clustering when given only ordinal data based on estimating the lens depth function and the $k$-relative neighborhood graph on a data set.

Comparison Based Learning from Weak Oracles

- Computer Science, MathematicsAISTATS
- 2018

This paper introduces a new weak oracle model, where a non-malicious user responds to a pairwise comparison query only when she is quite sure about the answer, and proposes two algorithms which provably locate the target object in a number of comparisons close to the entropy of the target distribution.

Foundations of Comparison-Based Hierarchical Clustering

- Computer Science, MathematicsNeurIPS
- 2019

This work addresses the classical problem of hierarchical clustering, but in a framework where one does not have access to a representation of the objects or their pairwise similarities, and develops variants of average linkage and single and complete linkage.

Near-Optimal Comparison Based Clustering

- Computer Science, MathematicsNeurIPS
- 2020

This paper theoretically shows that the approach can exactly recover a planted clustering using a near-optimal number of passive comparisons, and empirically validate the theoretical findings and demonstrate the good behaviour of the method on real data.

Classification from Triplet Comparison Data

- Computer Science, MathematicsNeural Computation
- 2020

This letter proposes an unbiased estimator for the classification risk under the empirical risk minimization framework, which inherently has the advantage that any surrogate loss function and any model, including neural networks, can be easily applied.

Relative distance comparisons with confidence judgements

- Computer Science, MathematicsSDM
- 2019

This work discusses a variant of the distance comparison query where annotators are allowed to explicitly state their degree of confidence for each triplet, and proposes algorithms both for learning the underlying pairwise distances, as well as computing an embedding of the items from such triplets.

## References

SHOWING 1-10 OF 34 REFERENCES

Disorder inequality: a combinatorial approach to nearest neighbor search

- Computer ScienceWSDM '08
- 2008

A special property of the similarity function on a set S that leads to efficient combinatorial algorithms for S is introduced and it is shown that for the Reuters corpus average disorder is indeed quite small and that Ranwalk efficiently computes the nearest neighbor in most cases.

Randomized Partition Trees for Nearest Neighbor Search

- Mathematics, Computer ScienceAlgorithmica
- 2014

The $$k$$k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization…

Randomized Algorithms for Comparison-based Search

- Computer Science, MathematicsNIPS
- 2011

A lower bound of Ω(D log n/D + D2) average number of questions in the search phase for any randomized algorithm is presented, which demonstrates the fundamental role of D for worst case behavior.

Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design

- Computer Science, MathematicsSODA
- 2009

The technical contribution of the paper consists of handling "false positives" in data structures and an algorithmic technique up-aside-down-filter and the first known work-around for Navarro's impossibility of generalizing Delaunay graphs.

Cover trees for nearest neighbor

- Computer Science, MathematicsICML
- 2006

A tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points) that shows speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.

An Investigation of Practical Approximate Nearest Neighbor Algorithms

- Computer ScienceNIPS
- 2004

This paper asks the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how and why and introduces a new kind of metric tree that allows overlap.

Finding nearest neighbors in growth-restricted metrics

- Mathematics, Computer ScienceSTOC '02
- 2002

This paper develops an efficient dynamic data structure for nearest neighbor queries in growth-constrained metrics that satisfy the property that for any point q and number r the ratio between numbers of points in balls of radius 2r and r is bounded by a constant.

Rank-Based Similarity Search: Reducing the Dimensional Dependence

- Mathematics, Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015

A data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed.

Which Space Partitioning Tree to Use for Search?

- Mathematics, Computer ScienceNIPS
- 2013

The theoretical results which imply that trees with better vector quantization performance have better search performance guarantees are presented, and it is demonstrated, both theoretically and empirically, that large margin partitions can improve tree search performance.

A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree

- Mathematics, Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 2001

A new fast nearest-neighbor algorithm is described that uses principal component analysis to build an efficient search tree that efficiently uses a depth-first search and a new elimination criterion.