# Scalable distributed-memory external sorting

@article{Rahn2010ScalableDE, title={Scalable distributed-memory external sorting}, author={Mirko Rahn and Peter Sanders and John Victor Singler}, journal={2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)}, year={2010}, pages={685-688} }

We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to naive implementations of multiway merging and all other approaches known to us, the algorithm works with just two passes over the data even for the largest conceivable inputs. A second algorithm reduces communication overhead and uses more conventionalâ€¦Â

## 24 Citations

Communication-Efficient String Sorting

- Computer Science2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2020

These algorithms inspect only characters that are needed to determine the sorting order and communication volume is reduced by also communicating only those characters and by communicating repetitions of the same prefixes only once.

Engineering Algorithms for Large Data Sets

- Computer ScienceSOFSEM
- 2013

This paper outlines the general challenges of algorithm engineering and gives examples from my work like sorting, full text indexing, graph algorithms, and database engines.

Algorithm Engineering for Scalable Parallel External Sorting

- Computer Science2011 IEEE International Parallel & Distributed Processing Symposium
- 2011

The talk describes algorithm engineering (AE) as a methodology for algorithmic research where design, analysis, implementation and experimental evaluation of algorithms form a feedback cycle drivingâ€¦

Algorithm libraries for multi-core processors

- Computer Science
- 2010

By providing parallelized versions of established algorithm libraries, the Multi-Core STL provides basic algorithms for internal memory and the parallelized STXXL enables multi-core acceleration for algorithms on large data sets stored on disk.

TritonSort: A Balanced Large-Scale Sorting System

- Computer ScienceNSDI
- 2011

We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks inâ€¦

Energy-efficient sorting using solid state disks

- Computer ScienceInternational Conference on Green Computing
- 2010

Using a low-power processor, solid state disks, and efficient algorithms, this work beats the current records in the JouleSort benchmark for 10GB to 1 TB of data by factors of up to 5.1.

TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System

- Computer ScienceTOCS
- 2013

This article describes the hardware and software architecture necessary to operate TritonSort, a highly efficient, scalable sorting system designed to process large datasets, and is able to sort data at approximately 80% of the disksâ€™ aggregate sequential write speed.

Cache efficient functional algorithms

- Computer ScienceCommun. ACM
- 2015

A cost model for analyzing the memory efficiency of algorithms expressed in a simple functional language is presented and provable bounds imply that purely functional programs based on lists and trees with no special attention to any details of memory layout can be asymptotically as efficient as the carefully designed imperative I/O efficient algorithms.

Parallel Data Sort Using Networked FPGAs

- Computer Science2010 International Conference on Reconfigurable Computing and FPGAs
- 2010

This paper shows an example of a data sorting application that uses parallel servers to pre-sort data and then uses FPGAs within the switch to merge sort data as it passes through the network thereby reducing computation requirements at the client node.

Out-of-core distribution sort in the FG programming environment

- Computer Science2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
- 2010

Experimental results show that by using multiple pipelines, an out-of-core, distribution-based sorting program outperforms an out of-core sorting program based on columnsort approximately 75%â€“85% of the time-despite the advantages that the columnsort-based program holds.

## References

SHOWING 1-10 OF 36 REFERENCES

DEMSort â€” Distributed External Memory Sort

- Computer Science
- 2009

We present the results of our DEMSort program in various categories of the SortBenchmark. DEMSort is a sophisticated and highly tuned implementation of a mergesort-based algorithm. It makes use ofâ€¦

Asynchronous parallel disk sorting

- Computer ScienceSPAA '03
- 2003

We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algorithms have eitherâ€¦

Optimal parallel sorting in multi-level storage

- Computer ScienceSODA '94
- 1994

It is found that Sharesort achieves optimal time bounds for parallel sorting in multi-level storage, under a variety of models that have been defined in the literature.

High-performance sorting on networks of workstations

- Computer ScienceSIGMOD '97
- 1997

We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scaleâ€¦

Deterministic distribution sort in shared and distributed memory multiprocessors

- Computer ScienceSPAA '93
- 1993

An elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel diska and parallel memory hierarchies with both single and parallel processors and shows how to sort determiniatically in parallelMemory hierarchies.

Bulk Synchronous Parallel Algorithms for the External Memory Model

- Computer ScienceTheory of Computing Systems
- 2002

A simple, deterministic simulation technique is presented which transforms certain Bulk Synchronous Parallel (BSP) algorithms into efficient parallel EM algorithms that meet well known I /O complexity lower bounds for various problems, including sorting.

Slabpose Columnsort: A New Oblivious Algorithm for Out-of-Core Sorting on Distributed-Memory Clusters

- Computer ScienceAlgorithmica
- 2006

Slabpose columnsort is presented, a new oblivious algorithm that is the first out-of-core multiprocessor sorting algorithms that make no assumptions about the keys and produce output that is perfectly load balanced and in the striped order assumed by the Parallel Disk Model.

Merging Multiple Lists on Hierarchical-Memory Multiprocessors

- Computer ScienceJ. Parallel Distributed Comput.
- 1991

Performance and scalability of parallel database systems

- Computer Science
- 1994

This work proposes an architecture which extends the features of the shared-nothing architecture, widely adopted for current parallel database applications, and proposes a new characterization of data skew which captures distinct types of imbalance and presents two data partitioning strategies to deal with this problem in a parallel system.

Algorithms for parallel memory, II: Hierarchical multilevel memories

- Computer ScienceAlgorithmica
- 2005

The optimal sorting algorithm is randomized and is based upon the probabilistic partitioning technique developed in the companion paper for optimal disk sorting in a two-level memory with parallel block transfer.