# Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

@article{Aumller2020FairNN,
title={Fair Near Neighbor Search: Independent Range Sampling in High Dimensions},
author={Martin Aum{\"u}ller and R. Pagh and Francesco Silvestri},
journal={Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems},
year={2020}
}
• Published 5 June 2019
• Computer Science
• Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of…
9 Citations
Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All?
• Computer Science
ACM Transactions on Database Systems
• 2022
This work shows that LSH based algorithms can be made fair, without a significant loss in efficiency, and develops a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters.
Fair near neighbor search via sampling
• Computer Science
SIGMOD Rec.
• 2021
This paper studies the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned.
Sub-Linear Privacy-Preserving Near-Neighbor Search
• Computer Science
IACR Cryptol. ePrint Arch.
• 2019
This paper provides the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties and provides information theoretic bound for the privacy guarantees.
Approximation Algorithms for Socially Fair Clustering
• Computer Science
COLT
• 2021
This work introduces a strengthened LP relaxation and shows that it has an integrality gap of Θ( log l log log l ) for a fixed p, and presents a bicriteria approximation algorithm, which generalizes the bicritical approximation of Abbasi et al. (2021).
Algorithmic Techniques for Independent Query Sampling
Several generic techniques are distills the existing solutions into several generic techniques that, when put together, can be utilized to solve a great variety of IQS problems with attractive performance guarantees.
Near Neighbor: Who is the Fairest of Them All?
• Computer Science
NeurIPS
• 2019
This work shows that LSH based algorithms can be made fair, without a significant loss in efficiency, and shows an algorithm that reports a point in the r-neighborhood of a query $q$ with almost uniform probability.
Querying in the Age of Graph Databases and Knowledge Graphs
• Computer Science
SIGMOD Conference
• 2021
This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.
Improved Approximation Algorithms for Individually Fair Clustering
• Computer Science, Mathematics
AISTATS
• 2022
This work extends the framework of (Charikar et al., 2002; Swamy, 2016) and devise a 16-approximation algorithm for the facility location with lp-norm cost under matroid constraint which might be of an independent interest and proposes a reduction from an individually fair clustering to a group fairness requirement proposed by Kleindessner et al. (2019).

## References

SHOWING 1-10 OF 42 REFERENCES
Distance-Sensitive Hashing
• Computer Science, Mathematics
PODS
• 2018
This paper begins the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them, and extends existing LSH lower bounds, showing that they also hold in the asymmetric setting.
Parameter-free Locality Sensitive Hashing for Spherical Range Reporting
• Computer Science
SODA
• 2017
A parameter-free way of using multi-probing, for LSH families that support it, and it is shown that for many such families this approach allows us to get expected query time close to $O(n^\rho+t)$, which is the best the authors can hope to achieve using LSH.
Sub-Linear Privacy-Preserving Near-Neighbor Search
• Computer Science
IACR Cryptol. ePrint Arch.
• 2019
This paper provides the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties and provides information theoretic bound for the privacy guarantees.
Independent Range Sampling, Revisited Again
• Computer Science, Mathematics
SoCG
• 2019
This work revisits the range sampling problem and shows that it is possible to build efficient data structures for range sampling queries if the query time is allowed to hold in expectation, or obtain efficient worst-case query bounds by allowing the sampling probability to be approximately proportional to the weight.
Independent Range Sampling, Revisited
• Computer Science, Mathematics
ESA
• 2017
This paper obtains optimal data structure for one dimensional weighted range sampling problem, thereby extending the alias method to allow range queries and obtaining data structures with optimal space-query tradeoffs for 3D halfspace, 3D dominance, and 2D three-sided queries.
Sub-Linear Privacy-Preserving Near-Neighbor Search with Untrusted Server on Large-Scale Datasets
• Computer Science
• 2016
This paper provides the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties and provides information theoretic bound for the privacy guarantees.
A Framework for Similarity Search with Space-Time Tradeoffs using Locality-Sensitive Filtering
A framework for similarity search based on Locality-Sensitive Filtering (LSF) and a lower bound for the space-time tradeoff on the unit sphere that matches Laarhoven's and the authors' own upper bound in the case of random data is shown.
Independent range sampling
• Computer Science
PODS
• 2014
A new structure of O(n) space that answers a query in O(log n + t) expected time, and supports an update in O (log n) time is described, which is nearly optimal and the multiplicative term logM/B (n/B) is necessary.
Hashing-Based-Estimators for Kernel Density in High Dimensions
• Computer Science, Mathematics
2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
• 2017
This work introduces a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and gives general theorems bounding the variance of such estimators.
Approximate nearest neighbors: towards removing the curse of dimensionality
• Computer Science
STOC '98
• 1998
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d.