Tuning a parallel database algorithm on a shared‐memory multiprocessor

@article{Graefe1992TuningAP,
  title={Tuning a parallel database algorithm on a shared‐memory multiprocessor},
  author={Goetz Graefe and Shreekant S. Thakkar},
  journal={Software: Practice and Experience},
  year={1992},
  volume={22}
}
Database query processing can benefit significantly from parallelism. Parallel database algorithms combine substantial CPU and I/O activity, memory requirements, and massive data exchange between processes, all of which must be considered to obtain optimal performance. Since parallel external sorting is a very typical example, we have focused on sorting to tune Volcano, a new query processing system. The purpose of the Volcano project is to provide efficient, extensible tools for query and… 
Query evaluation techniques for large databases
TLDR
This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Volcano - An Extensible and Parallel Query Evaluation System
  • G. Graefe
  • Computer Science
    IEEE Trans. Knowl. Data Eng.
  • 1994
TLDR
Volcano is the first implemented query execution engine that effectively combines extensibility and parallelism, and is extensible with new operators, algorithms, data types, and type-specific methods.
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution
TLDR
The authors justify their decision to support hierarchical architectures and argue that the exchange operator offers a significant advantage for development and maintenance of database query processing software.
Alphasort: A cache-sensitive parallel external sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
Sort vs . Hash Revisited
TLDR
This article compares the concepts behind sortand hash-based queryprocessing algorithms and concludes that many dualities exist between the two types of algorithms and there is a strong reason why both hashand sort-based algorithms should be available in a query-processing system.
Domain-Partitioned Parallel Sort-Merge Join
TLDR
It is concluded that parallel sort-merge join is inferior to hash-based join algorithms unless the joining relations are already sorted.
CHAPTER 1-INTRODUCTION
TLDR
There has been a continuing increase in the amount of data handled by database management systems (DBMSs) in recent years, with a growing need for DBMSs to exhibit more sophisticated functionality such as the support of object-oriented, deductive, and multimediabased applications.
AlphaSort: a RISC machine sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
Sort versus Hash Revisited
TLDR
This article compares the concepts behind sort- and hash-based query-processing algorithms and concludes that there is a strong reason why both hash- and sort-based algorithms should be available in a query- processing system.
Adaptive Parallel Query Execution in DBS3
TLDR
DBS3, a shared-memory database system implemented on a 72-node KSR1 multiprocessor, is described, which addresses problems of start-up time of parallel operations, interference and poor load balancing among the processors due to skewed data distribution.
...
1
2
3
...

References

SHOWING 1-10 OF 52 REFERENCES
Prototyping Bubba, A Highly Parallel Database System
TLDR
The current Bubba prototype runs on a commercial 40-node multicomputer and includes a parallelizing compiler, distributed transaction management, object management, and a customized version of Unix.
Join processing in database systems with large main memories
TLDR
A new algorithm is presented which is a hybrid of two hash-based algorithms and which dominates the other algorithms presented, including sort-merge, which even in a virtual memory environment, the hybrid algorithm dominates all the others.
Design, analysis, and implementation of parallel external sorting algorithms
TLDR
A modified merge-sort is proposed to use as a method for eliminating duplicate records in a large file and a combinatorial model is developed to provide an accurate estimate for the cost of the duplicate elimination operation (both in the serial and the parallel cases).
Sampling Issues in Parallel Database Systems
TLDR
This paper proves that for query size estimation, stratified random sampling guarantees perfect load balancing without reducing the accuracy of the estimate, and that for a given number of I/O operations, page level sampling always produces a more accurate estimate than tuple level sampling.
A Low Communication Sort Algorithm for a Parallel Database Machine
TLDR
This work proposes a novel algorithm that exhibits complete parallelism during the sort, merge, and return-tohost phases, and decreases the amou@ of inter-processor communication compared to existing parallel sort algorithms.
Encapsulation of parallelism in the Volcano query processing system
TLDR
The reasons for not choosing the bracket model, the novel operator model, and details of Volcano's exchange operator that parallelizes all other operators are described, which makes implementation of parallel database algorithms significantly easier and more robust.
The Gamma Database Machine Project
TLDR
The design of the Gamma database machine and the techniques employed in its implementation are described and a thorough performance evaluation of the iPSC/s hypercube version of Gamma is presented.
A taxonomy of parallel sorting
TLDR
This paper analyzes the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters, and proposes a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms.
A Study of Sort Algorithms for Multiprocessor Database Machines
TLDR
This paper proposes a new algorithm called the modified block bitonic sort, which is the fastest of the algorithms over a wide range of values of interest to us, and presents the results of analyzing these different parallel external sorting algorithms.
Sort versus Hash Revisited
TLDR
This article compares the concepts behind sort- and hash-based query-processing algorithms and concludes that there is a strong reason why both hash- and sort-based algorithms should be available in a query- processing system.
...
1
2
3
4
5
...