Threshold Queries in Theory and in the Wild

  title={Threshold Queries in Theory and in the Wild},
  author={Angela Bonifati and Stefania Dumbrava and G. Fletcher and Jan Hidders and Matthias F. J. Hofer and Wim Martens and Filip Murlak and Joshua Shinavier and Slawomir Staworko and Dominik Tomaszuk},
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. In this paper, we present a deep theoretical analysis of threshold query evaluation and show that thresholds can be used to significantly improve the asymptotic bounds of state-of-the-art query… 

Figures and Tables from this paper

Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries

This work develops “ any-k ” algorithms which, without knowing the number of desired answers, push the ranking into joins and avoid materializing the join output earlier than necessary, and unifies into the same framework several solutions from different areas that had been studied in isolation.

LSQ 2.0: A linked dataset of SPARQL query logs

The LSQ dataset is presented, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints, and the model and vocabulary that it uses to represent these queries in RDF is discussed.

Representing Paths in Graph Database Pattern Matching

It is shown that, from a computational complexity point of view, PMRs seem especially well-suited for representing results of regular path queries and extensions thereof involving counting, random sampling, unions, and joins.

The Complexity of Regular Trail and Simple Path Queries on Undirected Graphs

Using techniques from structural graph theory, ranging from the graph minor theorem to group-labeled graphs, it is established that trail evaluation for simple chain regular expressions is tractable, whereas simple path evaluation is tractability for a large subclass.

Consistent Subgraph Matching over Large Graphs

It is shown that the satisfiability, im-plication, and validation problems of CGDs are coNP-complete, co NP-complete and NP- Complete, respectively, and that the CSM problem (under any kind of repair) is NP- complete.

Towards Theory for Real-World Data

This tutorial aims to provide an overview on several practical studies that have been conducted in the areas of tree- Structured and graph-structured data, with a focus on cases with strong interaction between analysis of the data and fundamental research.



Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

A framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation is created, and classic algorithms that find the k-shortest paths in a weighted graph are extended.

When is approximate counting for conjunctive queries tractable?

The first FPRAS and polynomial time sampler for the set of trees of size n accepted by a tree automaton is demonstrated, which improves the prior quasi-polynomial time randomized approximation scheme (QPRAS) and sampling algorithm of Gore, Jerrum, Kannan, Sweedyk, and Mahaney ’97.

Structural Tractability of Counting of Solutions to Conjunctive Queries

A parameter, called the quantified star size of a query ϕ, which measures how the free variables are spread in ϕ is introduced, and it is shown that for classes of queries for which these associated hypergraphs admit good decompositions, bounded quantification star size exactly characterizes the subclasses of hyper graphs for which counting the number of solutions is tractable.

Covers of Query Results

This work introduces succinct lossless representations of query results called covers, subsets of the query results that correspond to minimal edge covers in the hypergraphs of these results that express a host of computational problems such as aggregate-join queries, in-database optimization, matrix chain multiplication, and inference in probabilistic graphical models.

Tractable Counting of the Answers to Conjunctive Queries

When is the evaluation of conjunctive queries tractable?

It is shown that, in some sense, the evaluation of all conjunctive queries whose underlying graph is in C is tractable if, and only if, C has bounded tree-width.

Size Bounds and Query Plans for Relational Joins

This work studies relational joins from a theoretical perspective and shows that there exist queries for which the join-project plan suggested by the fractional edge cover approach may be substantially better than any join plan that does not use intermediate projections.

Efficient top-k aggregation of ranked inputs

A new algorithm is proposed, designed to minimize the number of object accesses, the computational cost, and the memory requirements of top-k search with monotone aggregate functions, and is shown to be orders of magnitude faster.

A survey of top-k query processing techniques in relational database systems

This survey describes and classify top-k processing techniques in relational databases including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions, and shows the implications of each dimension on the design of the underlying techniques.

Including Group-By in Query Optimization

It is shown that the extent of improvement in the quality of plans is significant with only a modest increase in optimization cost, and the technique also applies to optimization of Select Distinct queries by pushing down duplicate elimination in a cost-based fashion.