• Corpus ID: 16760529

Comparison-Based Nearest Neighbor Search

@article{Haghiri2017ComparisonBasedNN,
  title={Comparison-Based Nearest Neighbor Search},
  author={Siavash Haghiri and Debarghya Ghoshdastidar and Ulrike von Luxburg},
  journal={ArXiv},
  year={2017},
  volume={abs/1704.01460}
}
We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively… 
Comparison-Based Random Forests
TLDR
A novel random forest algorithm for regression and classification that relies only on triplet comparisons that is efficient both for classification and regression and even competitive with other methods that have direct access to the metric representation of the data.
Scalable and Efficient Comparison-based Search without Features
TLDR
A new Bayesian comparison-based search algorithm with noisy answers is proposed; it has low computational complexity yet is efficient in the number of queries and provides theoretical guarantees, deriving the form of the optimal query and proving almost sure convergence to the target object.
K-Nearest Neighbor Approximation Via the Friend-of-a-Friend Principle
Suppose $V$ is an $n$-element set where for each $x \in V$, the elements of $V \setminus \{x\}$ are ranked by their similarity to $x$. The $K$-nearest neighbor graph %$G:=(V, E)$ is a directed graph
Machine learning in a setting of ordinal distance information
In a typical machine learning scenario we are given numerical dissimilarity values between objects (or feature representations of objects, from which such dissimilarity values can readily be
Lens Depth Function and k-Relative Neighborhood Graph: Versatile Tools for Ordinal Data Analysis
TLDR
This paper proposes algorithms for the problems of medoid estimation, outlier identification, classification, and clustering when given only ordinal data based on estimating the lens depth function and the $k$-relative neighborhood graph on a data set.
Comparison Based Learning from Weak Oracles
TLDR
This paper introduces a new weak oracle model, where a non-malicious user responds to a pairwise comparison query only when she is quite sure about the answer, and proposes two algorithms which provably locate the target object in a number of comparisons close to the entropy of the target distribution.
Foundations of Comparison-Based Hierarchical Clustering
TLDR
This work addresses the classical problem of hierarchical clustering, but in a framework where one does not have access to a representation of the objects or their pairwise similarities, and develops variants of average linkage and single and complete linkage.
Near-Optimal Comparison Based Clustering
TLDR
This paper theoretically shows that the approach can exactly recover a planted clustering using a near-optimal number of passive comparisons, and empirically validate the theoretical findings and demonstrate the good behaviour of the method on real data.
Classification from Triplet Comparison Data
TLDR
This letter proposes an unbiased estimator for the classification risk under the empirical risk minimization framework, which inherently has the advantage that any surrogate loss function and any model, including neural networks, can be easily applied.
Relative distance comparisons with confidence judgements
TLDR
This work discusses a variant of the distance comparison query where annotators are allowed to explicitly state their degree of confidence for each triplet, and proposes algorithms both for learning the underlying pairwise distances, as well as computing an embedding of the items from such triplets.
...
1
2
3
...

References

SHOWING 1-10 OF 34 REFERENCES
Disorder inequality: a combinatorial approach to nearest neighbor search
TLDR
A special property of the similarity function on a set S that leads to efficient combinatorial algorithms for S is introduced and it is shown that for the Reuters corpus average disorder is indeed quite small and that Ranwalk efficiently computes the nearest neighbor in most cases.
Randomized Partition Trees for Nearest Neighbor Search
The $$k$$k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization
Randomized Algorithms for Comparison-based Search
TLDR
A lower bound of Ω(D log n/D + D2) average number of questions in the search phase for any randomized algorithm is presented, which demonstrates the fundamental role of D for worst case behavior.
Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design
TLDR
The technical contribution of the paper consists of handling "false positives" in data structures and an algorithmic technique up-aside-down-filter and the first known work-around for Navarro's impossibility of generalizing Delaunay graphs.
Cover trees for nearest neighbor
TLDR
A tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points) that shows speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
An Investigation of Practical Approximate Nearest Neighbor Algorithms
TLDR
This paper asks the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how and why and introduces a new kind of metric tree that allows overlap.
Finding nearest neighbors in growth-restricted metrics
TLDR
This paper develops an efficient dynamic data structure for nearest neighbor queries in growth-constrained metrics that satisfy the property that for any point q and number r the ratio between numbers of points in balls of radius 2r and r is bounded by a constant.
Rank-Based Similarity Search: Reducing the Dimensional Dependence
  • M. E. Houle, Michael Nett
  • Mathematics, Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2015
TLDR
A data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed.
Which Space Partitioning Tree to Use for Search?
TLDR
The theoretical results which imply that trees with better vector quantization performance have better search performance guarantees are presented, and it is demonstrated, both theoretically and empirically, that large margin partitions can improve tree search performance.
A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree
  • J. Mcnames
  • Mathematics, Computer Science
    IEEE Trans. Pattern Anal. Mach. Intell.
  • 2001
TLDR
A new fast nearest-neighbor algorithm is described that uses principal component analysis to build an efficient search tree that efficiently uses a depth-first search and a new elimination criterion.
...
1
2
3
4
...