Parallel-Correctness and Transferability for Conjunctive Queries

@article{Ameloot2015ParallelCorrectnessAT,
  title={Parallel-Correctness and Transferability for Conjunctive Queries},
  author={Tom J. Ameloot and Gaetano Geck and Bas Ketsman and Frank Neven and Thomas Schwentick},
  journal={Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems},
  year={2015}
}
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy… 
A datalog-based computational model for coordination-free, data-parallel systems
TLDR
The case is made that the current form of CALM does not hold in general for data-parallel systems, and how, using novel techniques, the satisfiability of the CALM principle can still be obtained although just for the subclass of programs called connected monotonic queries.
Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)
TLDR
This report identifies some of the most important research directions where the PDM community has the potential to make significant contributions from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term.
XX : 2 Worst-Case Optimal Join at a Time
TLDR
The technical contribution is an effective procedure that achieves optimality with multiway join-at-a-time query plans by employing succinct representations of the intermediate results and a new join operator called Joen that can work on such representations.
A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries
TLDR
A multi-round algorithm is described that computes any query with load m/p^(1/rho*) per server, in the case when all input relations are binary, which is proved to be the optimal load for all queries over binary input relations.
Efficient and private approximations of distributed databases calculations
TLDR
This paper provides a sampling method targeted at separate, non-collaborating, vertically partitioned datasets and provides an analysis of the bound on error as a function of the sample size.
Parallel-Correctness and Transferability for Conjunctive Queries
TLDR
This work introduces a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy, and investigates the complexity of transferability for certain families of distribution policies, including the Hypercube distribution policies.
Data partitioning for single-round multi-join evaluation in massively parallel systems
TLDR
A correctness condition, called parallel-correctness, is introduced for the evaluation of queries w.r.t. a distribution policy, and a semantical characterization for when conjunctive queries (and extensions thereof) are parallel- correct is provided and matching complexity bounds for the associated decision problem are given.

References

SHOWING 1-10 OF 14 REFERENCES
Weaker Forms of Monotonicity for Declarative Networking
TLDR
Equating increasingly larger classes of coordination-free computations with increasingly weaker forms of monotonicity and present explicit Datalog variants that capture each of these classes can be interpreted as a more fine-grained answer to the CALM-conjecture.
Skew in parallel query processing
TLDR
A tight connection is established between the fractional edge packing of the query and the amount of communication in two cases, where the data is skewed and the heavy hitters and their frequencies are known.
Communication steps for parallel query processing
TLDR
The problem of computing a relational query q on a large input database of size n, using a large number p of servers is considered, and it is shown that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε.
Shark: SQL and rich analytics at scale
TLDR
Shark is a new data analysis system that marries query processing with complex analytics on large clusters and extends such an engine in several ways, including column-oriented in-memory storage and dynamic mid-query replanning, to effectively execute SQL.
Parallel skyline queries
TLDR
This paper design and analyze parallel algorithms for skyline queries using the MP model and a variation of the model in (Afrati and Ullman, EDBT 2010), the GMP model, which demands weaker load balancing constraints, and presents a 1-step algorithm in theGMP model for any number of dimensions.
Win-move is coordination-free (sometimes)
TLDR
It is shown the surprising result that the query given by the well-founded semantics of the unstratifiable win-move program is coordination-free in some of the models the authors consider, and it is shown that the original transducer network model and variants form a strict hierarchy of classes of coordination- free queries.
Parallel evaluation of conjunctive queries
TLDR
This paper analyzes the complexity of conjunctive queries, and proposes a very simple model of parallel computation that captures these architectures, in which the complexity parameter is the number of parallel steps requiring synchronization of all servers.
Optimizing joins in a map-reduce environment
TLDR
The problem of optimizing the shares, given a fixed number of Reduce processes, is studied, and an algorithm for detecting and fixing problems where an attribute is "mistakenly" included in the map-key is given.
Foundations of Databases
TLDR
This book discusses Languages, Computability, and Complexity, and the Relational Model, which aims to clarify the role of Semantic Data Models in the development of Query Language Design.
...
1
2
...