New Algorithms for Subset Query, Partial Match, Orthogonal Range Searching, and Related Problems

  title={New Algorithms for Subset Query, Partial Match, Orthogonal Range Searching, and Related Problems},
  author={Moses Charikar and Piotr Indyk and Rina Panigrahy},
We consider the subset query problem, defined as follows: given a set P of N subsets of a universe U, |U| = m, build a data structure, which for any query set Q ? U detects if there is any P ? P such that Q ? P. This is essentially equivalent to the partial match problem and is a fundamental problem in many areas. In this paper we present the first (to our knowledge) algorithms, which achieve non-trivial space and query time bounds for m = ?(log N). In particular, we present two algorithms with… 

Cell-probe lower bounds for the partial match problem

Given a database of n points in (0,1)d, the partial match problem is: In response to a query x in (0, 1, *)d, find a database point y such that for every i whenever xi ≠ *, we have xi = yi. In this

2 Searching Heterogeneous Data

The basic idea is to precumpute certain queries and store their results, and user queries are then answered by retrieving the “closest” stored query and removing from its answers all false positives.


  • R. TamassiaB. Cantrill
  • Computer Science
    2008 49th Annual IEEE Symposium on Foundations of Computer Science
  • 2008
We show that a large fraction of the data-structure lower bounds known today in fact follow by reduction from the communication complexity of lopsided (asymmetric) set disjointness! This includes

Lower bound techniques for data structures

We describe new techniques for proving lower bounds on data-structure problems, with the following broad consequences: (1) the first Ω(lg n) lower bound for any dynamic problem, improving on a bound

Subsets and Supermajorities: Optimal Hashing-based Set Similarity Search

A new generalized Set Similarity Search problem, which assumes the size of the database and query sets are known in advance, is formulated and optimally solved, and the lower bounds follow from new hypercontractive arguments.

Unifying the Landscape of Cell-Probe Lower Bounds

  • M. Patrascu
  • Computer Science, Mathematics
    SIAM J. Comput.
  • 2011
We show that a large fraction of the data-structure lower bounds known today in fact follow by reduction from the communication complexity of lopsided (asymmetric) set disjointness. This includes

Treedy: A Heuristic for Counting and Sampling Subsets

This work presents a tree-based greedy heuristic, Treedy, that for a given positive tolerance d answers such counting and sampling queries to within a guaranteed relative error d and total variation distance d, respectively.

Completeness for First-order Properties on Sparse Structures with Algorithmic Applications

This work shows completeness of the Sparse Orthogonal Vectors problem for the class of first-order properties under fine-grained reductions, the first such completeness result for a standard complexity class.


This chapter considers the following problem: given a set P of points in a high-dimensional space, construct a data structure that given any query point q finds the point in P closest to q, which is of significant importance to several areas of computer science.

A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match

This work investigates a geometric approach to proving cell probe lower bounds for data structure problems, and shows that any (randomized) data structure for the problem that answers c-approximate nearest neighbor search queries using t probes must use space at least $n^{1+\Omega(1/ct)}$.



Approximate range searching

It is shown that if one is willing to allow approximate ranges, then it is possible to do much better than current state-of-the-art results, and empirical evidence is given showing that allowing small relative errors can significantly improve query execution times.

A data structure for orthogonal range queries

  • G. S. Lueker
  • Computer Science
    19th Annual Symposium on Foundations of Computer Science (sfcs 1978)
  • 1978
It is shown that a decision tree of height O(dn log n) can be constructed to process n operations in d dimensions, suggesting that the standard decision tree model will not provide a useful method for investigating the complexity of orthogonal range queries.

Partial-Match Retrieval Algorithms

A new class of combinatorial designs (called associative block designs) provides better hash functions with a greatly reduced worst-case number of lists examined, yet with optimal average behavior maintained.

Efficient search for approximate nearest neighbor in high dimensional spaces

Significantly improving and extending recent results of Kleinberg, data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension are constructed.

High-dimensional computational geometry

  • P. Indyk
  • Computer Science, Mathematics
  • 2000
This thesis shows that it is in fact possible to obtain efficient algorithms for the nearest neighbor problem and a wide range of metrics, including Euclidean, Manhattan or maximum norms and Hausdorff metrics; some of the results hold even for general metrics.

Approximate nearest neighbors: towards removing the curse of dimensionality

Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d.

Lower bounds for high dimensional nearest neighbor search and related problems

This work investigates the exact nearest neighbors search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen to study data structure problems and derives non-trivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search.

Packet classification using tuple space search

The Pruned Tuple Space search is the only scheme known to us that allows fast updates and fast search times, and an optimal algorithm is described, called Rectangle Search, for two-dimensional filters.

Space/time trade-offs in hash coding with allowable errors

Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

Advances in Discrete and Computational Geometry

Geometric range searching and its relatives by P. K. Agarwal and J. Erickson Deformed products and maximal shadows of polytopes by N. Amenta and G. M. Ziegler Flag complexes, labelled rooted trees,