Approximate Furthest Neighbor in High Dimensions

  title={Approximate Furthest Neighbor in High Dimensions},
  author={R. Pagh and Francesco Silvestri and Johan Sivertsen and Matthew Skala},
Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor AFN queries. We present a simple, fast, and highly practical data structure for answering AFN queries in high-dimensional Euclidean space. We build on the technique of Indyk SODA 2003, storing random projections to provide sublinear query time for AFN. However, we introduce a different query algorithm, improving on Indyk's… 
Approximate furthest neighbor with application to annulus query
Fast approximate furthest neighbors with data-dependent hashing
This work presents a novel hashing strategy for approximate furthest neighbor search that selects projection bases using the data distribution and presents a variant of the algorithm that gives an absolute approximation guarantee; to the authors' knowledge, this is the first such approximate furtheST neighbor hashing approach to give such a guarantee.
Fast Approximate Furthest Neighbors with Data-Dependent Candidate Selection
A novel strategy for approximate furthest neighbor search that selects a candidate set using the data distribution is presented, which leads to an algorithm that is able to outperform existing approximate furtheST neighbor strategies and gives an absolute approximation guarantee.
Reverse Query-Aware Locality-Sensitive Hashing for High-Dimensional Furthest Neighbor Search
A novel concept of Reverse Locality-Sensitive Hashing (RLSH) family is introduced which is directly designed for c-AFN search and two novel hashing schemes RQALSH and RQalSH are proposed for highdimensional c- AFN search over external memory.
Fast Distance Metrics in Low-dimensional Space for Neighbor Search Problems
This work proposes the following three-step procedure for enhancing the accuracy of popular dimension reduction techniques that project data on a low dimensional subspace, and demonstrates significant enhancements in average accuracy for Euclidean distance and Mahalanobis distance, and improvements in evaluating $k$-nearest neighbors and $k-furthest neighbors by using the enhanced Euclidesan distance formula.
In-Range Farthest Point Queries and Related Problem in High Dimensions
This paper develops a bi-criteria approximation scheme for In-Range Farthest Point (IFP) Query and shows that the IFP result can be applied to develop query scheme with similar time and space complexities to achieve a (1 + ε)-approximation for MEB.
On the Complexity of Inner Product Similarity Join
A systematic study of inner product similarity join, showing new lower and upper bounds for (A)LSH-based algorithms and showing that asymmetry can be avoided by relaxing the LSH definition to only consider the collision probability of distinct elements.
Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space
  • A. Hassanat
  • Computer Science
    ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal
  • 2018
4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches are implemented.
XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces
This paper proposes a new indexing technique called XM-tree, that partitions the space using spheres, and proposes a parallel version of the structure on a set of real machine, to eliminate some objects without the need to compute their relative distances to a query object.


Locality-sensitive hashing scheme based on p-stable distributions
A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.
Increasing Diversity Through Furthest Neighbor-Based Recommendation
The experiments show that the proposed furthest neighbor method provides more diverse recommendations with a tolerable loss in precision in comparison to traditional nearest neighbor methods.
User-centric evaluation of a K-furthest neighbor collaborative filtering recommender algorithm
This paper presents an inverted neighborhood model, k-Furthest Neighbors, to identify less ordinary neighborhoods for the purpose of creating more diverse recommendations and shows that even though the proposed furthest neighbor model is outperformed in the traditional evaluation setting, the perceived usefulness of the algorithm shows no significant difference in the results of the user study.
Reductions among high dimensional proximity problems
Improved running times for a wide range of approximate high dimensional proximity problems are presented by reduction to Nearest Neighbour queries by obtaining subquadratic running time for each.
Measuring the Dimensionality of General Metric Spaces
This paper introduces a new deenition of intrinsic dimensionality that is simple and eecient to estimate and which is shown analytically and experimentally to capture the essential feature of metric spaces that determine the behavior of all the search algorithms.
Aspects of Metric Spaces in Computation
Three major kinds of questions about metric spaces are considered here: the intrinsic dimensionality of a distribution, the maximum number of distance permutations, and the difficulty of reverse similarity search.
On variants of the Johnson–Lindenstrauss lemma
  • J. Matousek
  • Mathematics, Computer Science
    Random Struct. Algorithms
  • 2008
A simple and self‐contained proof of a version of the Johnson–Lindenstrauss lemma that subsumes a basic versions by Indyk and Motwani and a version more suitable for efficient computations due to Achlioptas is given.
Approximate minimum enclosing balls in high dimensions using core-sets
This work develops (1 + ε)-approximation algorithms that perform well in practice, especially for very high dimensions, in addition to having provable guarantees, and proves the existence of core-sets of size O(1/ε), improving the previous bound of O( 1/ε2).
Metric Spaces Library
We describe a library to support similarity searching in metric spaces. It contains various metric space and index implementations, as well as some tools to evaluate their performance for similarity
Better algorithms for high-dimensional proximity problems via asymmetric embeddings
This paper gives the first known <i>O</i>(1)- approximate nearest neighbor algorithm with fast query time and almost polynomial space for a product of Euclidean norms, a common generalization of both LSTM and∞ spaces.