Partitioned Parallel Radix Sort

@article{Lee2002PartitionedPR,
  title={Partitioned Parallel Radix Sort},
  author={Shindong Lee and Minsoo Jeon and Dongseung Kim and Andrew Sohn},
  journal={J. Parallel Distributed Comput.},
  year={2002},
  volume={62},
  pages={656-668}
}
Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance… Expand
Partitioned Parallel Radix Sort
TLDR
A new parallel radix sorter is presented that solves the communication problem of balanced radix sort, called partitioned parallel radIX sort, and reduces the communication time by eliminating the redistribution steps. Expand
PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort
TLDR
A novel parallel in-place radix sort algorithm, PARADIS, which addresses both problems: a "speculative permutation" solves the first problem by assigning multiple non-continuous array stripes to each processor, and the resulting shared-nothing scheme achieves full parallelization. Expand
Load-balanced parallel merge sort on distributed memory parallel computers
  • Minsoo Jeon, Dongseung Kim
  • Computer Science
  • Proceedings 16th International Parallel and Distributed Processing Symposium
  • 2002
Sort can be speeded up on parallel computers by dividing and computing data individually in parallel. Merge sort can be parallelized, however, the conventional algorithm implemented on distributedExpand
A full parallel radix sorting algorithm for multicore processors
TLDR
This paper introduces PARL, a parallel left radix sorting algorithm for use on ordinary shared memory multi core machines, that has just one simple statement in its sequential part and is user friendly in that the user calling this PARL algorithm does not have to deal with creating threads themselves; to sort their data, they just create a sorting object and make a call to a thread safe method in that object. Expand
Parallel Merge Sort with Load Balancing
TLDR
The proposed load-balanced merge sort utilizes all processors throughout the computation and evenly distributes data to all processors in each stage, thereby every processor is forced to work in all phases. Expand
Parallelizing Merge Sort onto Distributed Memory Parallel Computers
Merge sort is useful in sorting a great number of data progressively, especially when they can be partitioned and easily collected to a few processors. Merge sort can be parallelized, however,Expand
High-speed parallel external sorting of data with arbitrary distribution
TLDR
Two distribution-insensitive scalable parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of keys, which eventually contribute to achieve good performance are developed. Expand
Theoretically-Efficient and Practical Parallel In-Place Radix Sorting
TLDR
The performance of Regions Sort is compared to existing parallel in-place and out-of-place sorting algorithms on a variety of input distributions and shown to be faster than optimized out- of-place radix sorting and comparison sorting algorithms. Expand
Modified Pure Radix Sort for Large Heterogeneous Data Set
TLDR
This research paper presents a new modified pure radix sort algorithm for large heterogeneous data set, which tries to optimize all related problems of radixsort through this algorithm. Expand
Communication-efficient bitonic sort on a distributed memory parallel computer
TLDR
An efficient way of data communication in bitonic sort to minimize the interprocessor communication and comparison time is presented and it is believed the scheme is a good way to shorten the communication time in similar applications. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Partitioned Parallel Radix Sort
TLDR
A new parallel radix sorter is presented that solves the communication problem of balanced radix sort, called partitioned parallel radIX sort, and reduces the communication time by eliminating the redistribution steps. Expand
Load balanced parallel radix sort
TLDR
Experimental results indicate that balanced radix sort can sort OSG integers in 20 seconds and 128M doubles in 15 seconds on a 64-processor SPZWN while yielding over 40-fold speedup. Expand
Communication-efficient bitonic sort on a distributed memory parallel computer
TLDR
An efficient way of data communication in bitonic sort to minimize the interprocessor communication and comparison time is presented and it is believed the scheme is a good way to shorten the communication time in similar applications. Expand
Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)
TLDR
A novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and performance is invariant over the set of input distributions unlike previous efficient algorithms. Expand
Fast Parallel Sorting Under LogP: Experience with the CM-5
TLDR
The LogP model is shown to be a valuable guide in the development of parallel algorithms and a good predictor of implementation performance; the model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. Expand
Minimizing Communication in the Bitonic Sort
TLDR
A recirculating bitonic sorting network is presented, which is composed of one level of N/2 comparators plus an /spl Omega/-network of (log N-1) switch levels, which reduces the cost complexity to O(N log N) compared with the O (N log/sup 2/ N) of the original bitonic sorted network, while preserving the same time complexity. Expand
Parallel sorting and data partitioning by sampling
TLDR
The analysis is developed for parallel sorting in a local network environment, with distributed data sets in secondary storage devices, and a data partitioning method by sampling is proposed. Expand
Identifying the capability of overlapping computation with communication
  • A. Sohn, Jui-Yuan Ku, +5 authors Y. Yamaguchi
  • Computer Science
  • Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique
  • 1996
TLDR
Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. Expand
Sorting networks and their applications
To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, severalExpand
Tight Bounds on the Complexity of Parallel Sorting
  • F. Leighton
  • Mathematics, Computer Science
  • IEEE Transactions on Computers
  • 1985
TLDR
Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network. Expand
...
1
2
3
...