# Practical Massively Parallel Sorting

@article{Axtmann2015PracticalMP, title={Practical Massively Parallel Sorting}, author={Michael Axtmann and Timo Bingmann and Peter Sanders and Christian Schulz}, journal={Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures}, year={2015} }

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and overcome this gap both in theory and practice. The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort. In particular, our sample sort variant turns out to be very scalable both in theory and practice where it…

## Figures and Tables from this paper

## 27 Citations

Robust Massively Parallel Sorting

- Computer ScienceALENEX
- 2017

This work investigates distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements and designs a new variant of quicksort with fast high-quality pivot selection.

Engineering In-place (Shared-memory) Sorting Algorithms

- Computer ScienceACM Trans. Parallel Comput.
- 2022

In many of the remaining cases, the new In-place Parallel Super Scalar Radix Sort (IPS2Ra) turns out to be the best algorithm, confirming the claims made about the robust performance of the algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications.

Engineering a Distributed Histogram Sort

- Computer Science2019 IEEE International Conference on Cluster Computing (CLUSTER)
- 2019

This work adopts ideas of the well-known quickselect and sample sort algorithms to minimize data movement and demonstrates that this implementation can keep up with recently proposed distribution sort algorithms in large-scale experiments, without any assumptions on the input keys.

In-place Parallel Super Scalar Samplesort (IPS$^4$o)

- Computer Science
- 2017

We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability.…

Communication-Efficient String Sorting

- Computer Science2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2020

These algorithms inspect only characters that are needed to determine the sorting order and communication volume is reduced by also communicating only those characters and by communicating repetitions of the same prefixes only once.

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting

- Computer ScienceSPAA
- 2019

The performance of Regions Sort is compared to existing parallel in-place and out-of-place sorting algorithms on a variety of input distributions and shown to be faster than optimized out- of-place radix sorting and comparison sorting algorithms.

Massively Parallel ’ Schizophrenic ’ Quicksort

- Computer Science
- 2017

A communication library based on MPI is presented that supports communicator creation in constant time and without communication and the first efficient implementation of Schizophrenic Quicksort, a recursive sorting algorithm for distributed memory systems that is based on Quicksorts is presented.

Fully Flexible Parallel Merge Sort for Multicore Architectures

- Computer ScienceComplex.
- 2018

A fully flexible sorting method designed for parallel processing based on modified merge sort that can be implemented for a number of processors and shows that with each newly added processor sorting becomes faster and more efficient.

Distributed String Sorting Algorithms

- Computer Science
- 2019

This thesis presents two new distributed string sorting algorithms and introduces a new string generator producing string data sets with the ratio of the distinguishing prefix length to the entire string length being an input parameter.

Efficient Parallel Random Sampling—Vectorized, Cache-Efficient, and Online

- Computer ScienceACM Trans. Math. Softw.
- 2018

A simple divide-and-conquer scheme is proposed that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p+log p) on p processors, i.e., scales to massively parallel machines even for moderate values of n.

## References

SHOWING 1-10 OF 57 REFERENCES

A comparison of sorting algorithms for the connection machine CM-2

- Computer ScienceSPAA '91
- 1991

A fast sorting algorithm for the Connection Machine Supercomputer model CM-2 is developed and it is shown that any U(lg n)-depth family of sorting networks can be used to sort n numbers in U( lg n) time in the bounded-degree fixed interconnection network domain.

Direct Bulk-Synchronous Parallel Algorithms

- Computer ScienceSWAT
- 1992

It is shown that optimality to within a multiplicative factor close to one can be achieved for the problems of Gauss-Jordan elimination and sorting, by transportable algorithms that can be applied for a wide range of values of the parameters p, g, and L.

Parallel sorting by over partitioning

- Computer ScienceSPAA '94
- 1994

Implementations on the KSR1 and Hector shared memory multiprocessors show that PSOP achieves nearly linear speedup and outperforms alternative approaches.

Practical Massively Parallel Sorting - Basic Algorithmic Ideas

- Computer ScienceArXiv
- 2014

This work outlines ideas how to combine a number of basic algorithmic techniques which overcome bottlenecks to obtain sorting algorithms that scale to the largest available machines.

Parallel Sorting by Overpartitioning

- Computer Science
- 1994

The approach of parallel sorting by Overpartitioning (PSOP) limits the communication cost by moving each element between the processors at most once, and ensures good load balancing (even…

Communication efficient algorithms for fundamental big data problems

- Computer Science2013 IEEE International Conference on Big Data
- 2013

This work discusses linear programming in low dimensions, and gives examples for several fundamental algorithmic problems where nontrivial algorithms with sublinear communication volume are possible.

Communication-Efficient Parallel Sorting

- Computer ScienceSIAM J. Comput.
- 1999

The bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for it is shown that just computing the "or" of n bits distributed evenly to the first O(n/h) of an arbitrary number of processors in a BSP computer requires $\Omega(\log n/\log (h+1))$ communication rounds.

Sorting networks and their applications

- Computer ScienceAFIPS '68 (Spring)
- 1968

To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several…

Truly Efficient Parallel Algorithms: 1-optimal Multisearch for an Extension of the BSP Model

- Computer ScienceTheor. Comput. Sci.
- 1998

Communication Efficient Algorithms for Top-k Selection Problems

- Computer Science2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2016

We present scalable parallel algorithms with sublinear per-processor communication volume and low latency for several fundamental problems related to finding the most relevant elements in a set, for…