Parallel sorting on a shared-nothing architecture using probabilistic splitting

@article{DeWitt1991ParallelSO,
  title={Parallel sorting on a shared-nothing architecture using probabilistic splitting},
  author={David J. DeWitt and Jeffrey F. Naughton and Donovan A. Schneider},
  journal={[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems},
  year={1991},
  pages={280-291}
}
  • D. DeWitt, J. Naughton, D. Schneider
  • Published 1 December 1991
  • Computer Science
  • [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems
The authors consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms the authors consider is to determine the range of sort keys to be handled by each processor. They consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles. They present analytic results showing that… 

Figures and Tables from this paper

Parallel Sorting of Large Data Volumes on Distributed Memory Multiprocessors
TLDR
This algorithm is suited for large data volumes (external sorting) and does not suffer from processing skew in presence of data skew and the optimal degree of CPU parallelism is derived if I/O limitations are taken into account.
Parallel Sorting of Large Data Volumes on Distributed Memory Multiprocessors
TLDR
This algorithm is suited for large data volumes (external sorting) and does not suffer from processing skew in presence of data skew and the optimal degree of CPU parallelism is derived if I/O limitations are taken into account.
PPS-a parallel partition sort algorithm for multiprocessor database systems
TLDR
Experimental results demonstrate that the new algorithm performs better than existing parallel range partition sorting algorithms in a shared-nothing database environment for a wide degree of skew.
Overlapping Computations, Communications and I/O in parallel Sorting
TLDR
A new parallel sorting algorithm which maximizes the overlap between the disk, network, and CPU subsystems of a processing node is presented, which is shown to be of similar complexity to known efficient sorting algorithms.
A synthesis of parallel out-of-core sorting programs on heterogeneous clusters
  • C. Cérin, Hazem Fkaier, M. Jemni
  • Computer Science
    CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.
  • 2003
TLDR
Three techniques of parallel external sorting in the context of heterogeneous clusters are explored and it is shown how they can be deployed for clusters with processor performances related by a multiplicative factor.
Parallel Sorting by Approximate Splitting for Multi-core Processors
TLDR
An improved partition method, Parallel Sorting by Approximate Splitting, which is based on an extend pivots selecting algorithm which is more flexibility and efficiency than other algorithm, such as PSRS.
Adaptive data partition for sorting using probability distribution
TLDR
A new partition method in sorting scenario based on probability distribution is presented, an idea first studied by Janus and Lamagna in early 1980's on a mainframe computer and an efficient implementation on modern, cache-based machines is presented.
Adaptive data partition for sorting using probability distribution
  • Xipeng Shen, C. Ding
  • Computer Science
    International Conference on Parallel Processing, 2004. ICPP 2004.
  • 2004
TLDR
A new partition method in sorting scenario based on probability distribution is presented, an idea first studied by Janus and Lamagna in early 1980's on a mainframe computer and an efficient implementation on modern, cache-based machines is presented.
The parameterized Round-Robin partitioned algorithm for parallel external sort
  • H. Young, A. Swami
  • Computer Science
    Proceedings of 9th International Parallel Processing Symposium
  • 1995
TLDR
A new parameterized parallel sort algorithm, called Round-Robin Partitioned (or RRP), for the message passing (shared-nothing) architecture and is shown to be superior to the other algorithms for almost all configurations.
External Sorting for Databases in Distributed Heterogeneous Systems
TLDR
This paper describes a new, load{balanced external parallel sorting method which is more robust to data skew and to variable speed of processes and compares the run time of the new method with an analogous conventional method in case ofData skew and load imbalances.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
A Low Communication Sort Algorithm for a Parallel Database Machine
TLDR
This work proposes a novel algorithm that exhibits complete parallelism during the sort, merge, and return-tohost phases, and decreases the amou@ of inter-processor communication compared to existing parallel sort algorithms.
Parallel Partition Sort for Database Machines
TLDR
A new parallel sorting method, called a parallel partition sort, which transfers only a small amount of data and does not place large demands on the CPU is discussed, based on the top-down partitioning of data.
Parallel algorithms for the execution of relational database operations
TLDR
This paper presents and analyzes algorithms for parallel processing of relational database operations in a general multiprocessor framework, and introduces an analysis methodology which incorporates I/O, CPU, and message costs and which can be adjusted to fit different multiproprocessor architectures.
Sorting Large Files on a Backend Multiprocessor
TLDR
The results show that using current, off-the-shelf technology coupled with a streamlined distributed operating system, three- and five-microprocessor configurations, provide a very cost-effective sort of large files.
Percentile Finding Algorithm for Multiple Sorted Runs
TLDR
An efficient exact method is given which can find any percentile of an arbitrary number of sorted runs and can improve the spcedup for parallel sorting on multiple processors, and target the work to a parallel computer architecture of shared memory MIMD parallel processors.
A comparison of sorting algorithms for the connection machine CM-2
TLDR
A fast sorting algorithm for the Connection Machine Supercomputer model CM-2 is developed and it is shown that any U(lg n)-depth family of sorting networks can be used to sort n numbers in U( lg n) time in the bounded-degree fixed interconnection network domain.
Sampling Issues in Parallel Database Systems
TLDR
This paper proves that for query size estimation, stratified random sampling guarantees perfect load balancing without reducing the accuracy of the estimate, and that for a given number of I/O operations, page level sampling always produces a more accurate estimate than tuple level sampling.
Parallel sorting and data partitioning by sampling
TLDR
The analysis is developed for parallel sorting in a local network environment, with distributed data sets in secondary storage devices, and a data partitioning method by sampling is proposed.
Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer
TLDR
Two external sorting algorithms for hypercube database computers are presented based on partitioning of data according to partition values obtained through sampling of the data.
An Adaptive Method for Unknown Distributions in Distributive Partitioned Sorting
TLDR
An adaptation of DPS, which estimates the cumulative distribution function of the input data from a randomly selected sample, was developed and tested, and runs only 2-4 percent slower than DPS in the uniform case, but outperforms DPS by 12-13 percent on exponentially distributed data for sufficiently large files.
...
1
2
3
...