Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

@article{Aumller2020FairNN,
  title={Fair Near Neighbor Search: Independent Range Sampling in High Dimensions},
  author={Martin Aum{\"u}ller and R. Pagh and Francesco Silvestri},
  journal={Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems},
  year={2020}
}
Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of… 
Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All?
TLDR
This work shows that LSH based algorithms can be made fair, without a significant loss in efficiency, and develops a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters.
Fair near neighbor search via sampling
TLDR
This paper studies the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned.
Sub-Linear Privacy-Preserving Near-Neighbor Search
TLDR
This paper provides the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties and provides information theoretic bound for the privacy guarantees.
Approximation Algorithms for Socially Fair Clustering
TLDR
This work introduces a strengthened LP relaxation and shows that it has an integrality gap of Θ( log l log log l ) for a fixed p, and presents a bicriteria approximation algorithm, which generalizes the bicritical approximation of Abbasi et al. (2021).
Algorithmic Techniques for Independent Query Sampling
TLDR
Several generic techniques are distills the existing solutions into several generic techniques that, when put together, can be utilized to solve a great variety of IQS problems with attractive performance guarantees.
Near Neighbor: Who is the Fairest of Them All?
TLDR
This work shows that LSH based algorithms can be made fair, without a significant loss in efficiency, and shows an algorithm that reports a point in the r-neighborhood of a query $q$ with almost uniform probability.
Querying in the Age of Graph Databases and Knowledge Graphs
TLDR
This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.
Improved Approximation Algorithms for Individually Fair Clustering
TLDR
This work extends the framework of (Charikar et al., 2002; Swamy, 2016) and devise a 16-approximation algorithm for the facility location with lp-norm cost under matroid constraint which might be of an independent interest and proposes a reduction from an individually fair clustering to a group fairness requirement proposed by Kleindessner et al. (2019).

References

SHOWING 1-10 OF 42 REFERENCES
Distance-Sensitive Hashing
TLDR
This paper begins the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them, and extends existing LSH lower bounds, showing that they also hold in the asymmetric setting.
Parameter-free Locality Sensitive Hashing for Spherical Range Reporting
TLDR
A parameter-free way of using multi-probing, for LSH families that support it, and it is shown that for many such families this approach allows us to get expected query time close to $O(n^\rho+t)$, which is the best the authors can hope to achieve using LSH.
Sub-Linear Privacy-Preserving Near-Neighbor Search
TLDR
This paper provides the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties and provides information theoretic bound for the privacy guarantees.
Independent Range Sampling, Revisited Again
TLDR
This work revisits the range sampling problem and shows that it is possible to build efficient data structures for range sampling queries if the query time is allowed to hold in expectation, or obtain efficient worst-case query bounds by allowing the sampling probability to be approximately proportional to the weight.
Independent Range Sampling, Revisited
TLDR
This paper obtains optimal data structure for one dimensional weighted range sampling problem, thereby extending the alias method to allow range queries and obtaining data structures with optimal space-query tradeoffs for 3D halfspace, 3D dominance, and 2D three-sided queries.
Sub-Linear Privacy-Preserving Near-Neighbor Search with Untrusted Server on Large-Scale Datasets
TLDR
This paper provides the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties and provides information theoretic bound for the privacy guarantees.
A Framework for Similarity Search with Space-Time Tradeoffs using Locality-Sensitive Filtering
TLDR
A framework for similarity search based on Locality-Sensitive Filtering (LSF) and a lower bound for the space-time tradeoff on the unit sphere that matches Laarhoven's and the authors' own upper bound in the case of random data is shown.
Independent range sampling
TLDR
A new structure of O(n) space that answers a query in O(log n + t) expected time, and supports an update in O (log n) time is described, which is nearly optimal and the multiplicative term logM/B (n/B) is necessary.
Hashing-Based-Estimators for Kernel Density in High Dimensions
TLDR
This work introduces a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and gives general theorems bounding the variance of such estimators.
Approximate nearest neighbors: towards removing the curse of dimensionality
TLDR
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d.
...
1
2
3
4
5
...