Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer

@inproceedings{Baugst1989ParallelSM,
  title={Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer},
  author={Bj{\o}rn Arild W. Baugst{\o} and Jarle Fredrik Greipsland},
  booktitle={IWDM},
  year={1989}
}
Sorting is one of the basic operations in any database system. In this paper we present two external sorting algorithms for hypercube database computers. The methods are based on partitioning of data according to partition values obtained through sampling of the data. One of the algorithms which is implemented at the HC16 database computer designed at The Norwegian Institute of Technology, is described in detail together with a performance evaluation and a presentation of some test results. 
Multiprocessor algorithms for relational-database operators on hypercube systems
TLDR
This tutorial focuses on hypercube interconnected architectures as a computational engine for relational-database processing and experiments obtained from a portable hypercube-based database system are presented to characterize performance potential for various uniscan and multiscan operations.
Parallel Relational Database Algorithms
TLDR
The paper describes two classes of algorithms to perform relational database operations in parallel on a distributed memory parallel computer with a disk for each processor and shows how a bucket algorithm can be used to sort a relation.
Parallel Sorting of Large Data Volumes on Distributed Memory Multiprocessors
TLDR
This algorithm is suited for large data volumes (external sorting) and does not suffer from processing skew in presence of data skew and the optimal degree of CPU parallelism is derived if I/O limitations are taken into account.
Parallel Sorting of Large Data Volumes on Distributed Memory Multiprocessors
TLDR
This algorithm is suited for large data volumes (external sorting) and does not suffer from processing skew in presence of data skew and the optimal degree of CPU parallelism is derived if I/O limitations are taken into account.
On the design, implementation, and evaluation of a portable parallel database system
  • O. Frieder, P. Jackson
  • Computer Science
    Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications
  • 1990
TLDR
A portable parallel database system that exploits both parallel algorithms and data parallelism to expedite database processing is described and it is shown that, for joins with a comparable number of tuples in each of the two joining relations, a bucket-based approach is preferable.
External Sorting for Databases in Distributed Heterogeneous Systems
TLDR
This paper describes a new, load{balanced external parallel sorting method which is more robust to data skew and to variable speed of processes and compares the run time of the new method with an analogous conventional method in case ofData skew and load imbalances.
Experimentation with hypercube database engines
TLDR
Using Intel's iPSC/2 hypercube, the authors measured the relationship between packet size, method of clustering messages, and internode traffic on the total sustained communication bandwidth and analyzed duplicate removal algorithms.
Parallel sorting on a shared-nothing architecture using probabilistic splitting
  • D. DeWitt, J. Naughton, D. Schneider
  • Computer Science
    [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems
  • 1991
TLDR
The authors consider the problem of external sorting in a shared-nothing multiprocessor with two techniques for determining ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles.
Relational Algebra Operations
TLDR
A set of relational algebra operations are described and slightly enhanced for improvediciency and the results of the DeWitt join test runs are given.
...
1
2
3
4
...

References

SHOWING 1-9 OF 9 REFERENCES
Parallel Partition Sort for Database Machines
TLDR
A new parallel sorting method, called a parallel partition sort, which transfers only a small amount of data and does not place large demands on the CPU is discussed, based on the top-down partitioning of data.
Algebra Operations on a Parallel Computer - Performance Evaluation
TLDR
The design of a parallel database computer that contains 8 single board computers that communicate over a system of shared RAM, allowing fast communication without interference, and test results are reported.
Data Structures and Algorithms
TLDR
The basis of this book is the material contained in the first six chapters of the earlier work, The Design and Analysis of Computer Algorithms, and has added material on algorithms for external storage and memory management.
Join on a Cube: Analysis, Simulation, and Implementation
TLDR
This paper discusses one part of the work, viz., the study of the join operation, where novel data redistribution operations are employed to improve the performance of the various database operations including join.
Multiprocessor Hash-Based Join Algorithms
TLDR
It is demonstrated that bit vector filtering provides dramatic improvement in the performance of all algorithms including the sort mergejoin algorithm, and is shown to provide linear increases in throughput with corresponding increases in processor and disk resources.
Sorting and Searching
The first revision of this third volume is a survey of classical computer techniques for sorting and searching. It extends the treatment of data structures in Volume 1 to consider both large and
Hashing Methods and Relational Algebra Operations
TLDR
The relational algebra operatrons described in this paper are under implementation in TECHRA (TECHBC), a database system especially designed to meet the needs of technical applications, like CAD systems, utility maps, oil field exploration, etc.
A Neighbor Connected Processor Network for Performing Relational Algebra Operations
TLDR
The capacity of the communication network have been analyzed under the workload of relational algebra operations and each of 2 or 3 cells have been found to give the highest processing capacity per cell in the network.
Binsorting on hypercubes with d-port communication
TLDR
Three sorting algorithms are given for hypercubes with d-port communication based on binsort at the global level to reduce communication costs and reduce the variance among the lengths of the subsequences left in the nodes after the complete exchange of bins.