Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems

@article{Ngo2018WorstCaseOJ,
  title={Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems},
  author={Hung Quoc Ngo},
  journal={Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems},
  year={2018}
}
  • H. Ngo
  • Published 27 March 2018
  • Computer Science
  • Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Worst-case optimal join algorithms are the class of join algorithms whose runtime match the worst-case output size of a given join query. While the first provably worse-case optimal join algorithm was discovered relatively recently, the techniques and results surrounding these algorithms grow out of decades of research from a wide range of areas, intimately connecting graph theory, algorithms, information theory, constraint satisfaction, database theory, and geometric inequalities. These ideas… 

Tables from this paper

Index-Structures for Worst-Case Optimal Join Algorithms
TLDR
This work developed two new variants of the Leapfrog Triejoin and introduced and evaluate two index-structures, and discusses the strengths and limitations of the join algorithms and their index- Structures.
Optimal Join Algorithms Meet Top-k
TLDR
It is argued that the two areas of optimal join algorithms and ranked enumeration can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries.
Domain Ordering and Box Cover Problems for Beyond Worst-Case Join Processing
TLDR
This thesis defines several optimization problems over the space of domain orderings where the objective is to minimize the size of either the minimum box certificate or the Minimum box cover for the given input query and provides approximation algorithms for several of these problems.
A Worst-Case Optimal Join Algorithm for SPARQL
TLDR
This paper proposes a novel procedure for evaluating SPARQL queries based on an existing worst-case join algorithm called Leapfrog Triejoin, and proposes and implements an adaptation of this algorithm, and shows that with this new join algorithm, Apache Jena often runs orders of magnitude faster than the base version and two other SParQL engines: Virtuoso and Blazegraph.
Optimal Joins using Compact Data Structures
TLDR
It is shown that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage, and developed a compositional algorithm to process full join queries under this representation.
Worst-Case Optimal Graph Joins in Almost No Space
TLDR
An indexing scheme that supports worst-case optimal graph joins in almost no space beyond storing the graph itself and offers the best overall performance for query times while using only a small fraction of the space when compared with several state-of-the-art approaches.
RapidMatch: A Holistic Approach to Subgraph Query Processing
TLDR
This paper proves that the complexity of result enumeration in state-of-the-art exploration-based methods matches that of the worst-case optimal join and proposes RapidMatch, a holistic subgraph query processing framework integrating the two approaches.
Optimal Joins using Compressed Quadtrees
TLDR
It is shown that worst-case optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of any significant extra storage, and a compositional algorithm to process full join queries is developed.
Degree Sequence Bound For Join Cardinality Estimation
TLDR
This work proves a novel bound called the Degree Sequence Bound which takes into account the full degree sequences and the max tuple multiplicity on Berge-Acyclic queries, and describes how to practically compute this bound using a functional approximation of the true degree sequences.
2 Counting Triangles under Updates in Worst-Case Optimal Time 1
TLDR
An approach is introduced that exhibits a space- time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size.
...
1
2
3
4
...

References

SHOWING 1-10 OF 78 REFERENCES
Skew strikes back: new developments in the theory of join algorithms
TLDR
A survey of recent work on join algorithms that have provable worst-case optimality runtime guarantees is described and a simpler and unified description of these algorithms is provided that is useful for theory-minded readers, algorithm designers, and systems implementors.
Towards a Worst-Case I/O-Optimal Algorithm for Acyclic Joins
TLDR
This paper is able to prove that the "triangle query" algorithm is I/O-optimal for certain classes of acyclic joins without deriving its bound explicitly.
Join Processing for Graph Patterns: An Old Dog with New Tricks
TLDR
It is found that classical relational databases like Postgres and MonetDB or newer graph databases/stores like Virtuoso and Neo4j may be orders of magnitude slower than these new approaches compared to a fully featured RDBMS, LogicBlox, using these new ideas.
Worst-Case Optimal Algorithms for Parallel Query Processing
TLDR
This paper studies the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers, and shows a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms.
Triejoin: A Simple, Worst-Case Optimal Join Algorithm
TLDR
It is established that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR.
Size Bounds and Query Plans for Relational Joins
TLDR
This work studies relational joins from a theoretical perspective and shows that there exist queries for which the join-project plan suggested by the fractional edge cover approach may be substantially better than any join plan that does not use intermediate projections.
A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries
TLDR
A multi-round algorithm is described that computes any query with load m/p^(1/rho*) per server, in the case when all input relations are binary, which is proved to be the optimal load for all queries over binary input relations.
Distributed Evaluation of Subgraph Queries Using Worst-case Optimal and Low-Memory Dataflows
TLDR
This work presents the first approach that performs worst-case optimal computation and communication, maintains a total memory footprint linear in the number of input edges, and scales down per-worker computation, communication, and memory requirements linearly as thenumber of workers increases, even on adversarially skewed inputs.
Joins via Geometric Resolutions: Worst-case and Beyond
TLDR
An algorithm is designed that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins and uses the framework and the same algorithm to show a series of what are colloquially known as beyond worst- case results.
Beyond worst-case analysis for joins with minesweeper
TLDR
A new algorithm is described, Minesweeper, that is able to satisfy stronger runtime guarantees than previous join algorithms (colloquially ``beyond worst-case'' guarantees) for data in indexed search trees and a dichotomy theorem is developed for the certificate-based notion of complexity.
...
1
2
3
4
5
...