Diverse near neighbor problem

  title={Diverse near neighbor problem},
  author={Sofiane Abbar and Sihem Amer-Yahia and Piotr Indyk and Sepideh Mahabadi and Kasturi R. Varadarajan},
  booktitle={SoCG '13},
Motivated by the recent research on diversity-aware search, we investigate the k-diverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity set S of k points in the ball of radius r around q. The diversity of a set S is measured by the minimum distance between any pair of points in $S$ (the higher, the better). We present two approximation algorithms for the case where the points live in a d-dimensional Hamming space. Our… 

Tables from this paper

Approximate nearest neighbor and its many variants

This thesis investigates two variants of the approximate nearest neighbor problem and presents two approximation algorithms for the case where the points live in a d-dimensional Hamming space and guarantees query times that are sub-linear in n and only polynomial in the diversity parameter k.

Approximate Nearest Neighbor And Its Many Variants by Sepideh Mahabadi

This thesis investigates two variants of the approximate nearest neighbor problem: the k-diverse near neighbor reporting problem and the approximate line near neighbor (LNN) problem, and presents two approximation algorithms for the case where the points live in a d-dimensional Hamming space.

Greedy $k$-Center from Noisy Distance Samples

Active algorithms are proposed, based on ideas such as UCB and Thompson sampling developed in the closely related Multi-Armed Bandit problem, which adaptively decide which queries to send to the oracle and are able to solve the canonical $k$-center problem within an approximation ratio of two with high probability.

Distance-Sensitive Hashing

This paper begins the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them, and extends existing LSH lower bounds, showing that they also hold in the asymmetric setting.

Diverse nearest neighbors queries using linear skylines

Two approaches that leverage the notion of linear skyline queries in order to find the k diverse nearest neighbors within a radius r from a given query point, or (k, r)-DNNs for short are proposed.

In-Range Farthest Point Queries and Related Problem in High Dimensions

The IFP result can be applied to develop query scheme with similar time and space complexities to achieve a (1 + ϵ )-approximation for MEB and these are the first theoretical results on such high dimensional range-aggregate query problems.

Towards Spatially- and Category-Wise k-Diverse Nearest Neighbors Queries

Two approaches are proposed that leverage the notion of linear skyline queries in order to find spatially- and category-wise diverse k-NNs w.r.t. a given query point and which return all optimal solutions for any linear combination of the weights a user could give to the two competing criteria.

The Moving K Diversified Nearest Neighbor Query

This work proposes an algorithm to maintain incrementally the k diversified nearest neighbors of the query object to reduce the costs of continuous query processing and proposes two approximate algorithms to obtain even higher query efficiency with precision bounds.

Sublinear algorithms for massive data problems

This thesis presents algorithms and proves lower bounds for fundamental computational problems in the models that address massive data sets, and introduces theoretical problems and concepts that model computational issues arising in databases, computer vision and other areas.

Improved Approximation and Scalability for Fair Max-Min Diversification

Algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing are presented.



Providing Diversity in K-Nearest Neighbor Query Results

This paper proposes a user-tunable definition of diversity, and presents an algorithm, called MOTLEY, for producing a diverse result set as per this definition, and shows that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database.

A near-linear algorithm for projective clustering integer points

The main result is a randomized algorithm that for any e > 0 runs in time O(mn polylog(mn)) and outputs a solution that with high probability is within (1 + e) of the optimal solution.

Top-k bounded diversification

This paper introduces Space Partitioning and Probing (SPP), an algorithm that minimizes the number of accessed objects while finding exactly the same result as MMR, the most popular diversification algorithm.

Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality

Two algorithms for the approximate nearest neighbor problem in high dimensional spaces for data sets of size n living in IR are presented, achieving query times that are sub-linear in n and polynomial in d.

Efficient diversity-aware search

This work proposes DIVGEN, an efficient algorithm for diversity-aware search, which achieves significant performance improvements via novel data access primitives, and devise the first low-overhead data access prioritization scheme with theoretical quality guarantees, and good performance in practice.

Search result diversity for informational queries

This paper presents a search-diversification algorithm particularly suitable for informational queries by explicitly modeling that the user may need more than one page to satisfy their need, and enables the algorithm to make a well-informed tradeoff between a user's desire for multiple relevant documents.

Facility Dispersion Problems: Heuristics and Special Cases (Extended Abstract)

It is proved that obtaining a performance guarantee of less than 2 is NP-hard, and polynomial time algorithms for obtaining optimal solutions under both MAX-MIN and MAX-AVG criteria are provided.

Learning Approximate Sequential Patterns for Classification

The pattern discovery approach identified approximately conserved sequences of morphology variations that were predictive of future death in a test population and improved the running time of the search algorithm by an order of magnitude without any noticeable effect on accuracy.

Geometric Approximation via Coresets

The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P and has been successfully applied to various optimization and extent measure problems.