Asynchronous parallel disk sorting

@inproceedings{Dementiev2003AsynchronousPD,
  title={Asynchronous parallel disk sorting},
  author={Roman Dementiev and Peter Sanders},
  booktitle={SPAA '03},
  year={2003}
}
We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algorithms have either suboptimal I/O volume or cannot guarantee that I/O and computations can always be overlapped. We give an efficient implementation that can (at least) compete with the best practical implementations but gives additional performance guarantees. For the experiments we have configured a state of the art… 
Scalable distributed-memory external sorting
TLDR
An algorithm whose I/O requirement is close to a lower bound is outlined, in contrast to naive implementations of multiway merging and all other approaches known to us, the algorithm works with just two passes over the data even for the largest conceivable inputs.
Efficient out-of-core sorting algorithms for the Parallel Disks Model
Guidesort: Simpler Optimal Deterministic Sorting for the Parallel Disk Model
A new algorithm, Guidesort, for sorting in the uniprocessor variant of the parallel disk model (PDM) of Vitter and Shriver is presented. The algorithm is deterministic and executes a number of
Efficient PDM sorting algorithms
TLDR
This paper presents efficient algorithms for sorting on the The Parallel Disks Model (PDM) and implemented these algorithms and evaluated their performance.
A Simple Optimal Randomized Algorithm for Sorting on the PDM
TLDR
This paper presents a simple randomized algorithm that sorts in optimal time with high probablity and has all the desirable features for practical implementation.
Duality Between Prefetching and Queued Writing with Parallel Disks
TLDR
A useful and natural duality is defined between writing to parallel disks and the seemingly more difficult problem of prefetching and this duality gives the first parallel disk sorting algorithms that are provably optimal up to lower order terms.
Algorithms and Data Structures for External Memory
  • J. Vitter
  • Computer Science
    Found. Trends Theor. Comput. Sci.
  • 2006
TLDR
The state of the art in the design and analysis of algorithms and data structures for external memory (or EM for short), where the goal is to exploit locality and parallelism in order to reduce the I/O costs is surveyed.
External memory algorithms and data structures: dealing with massive data
TLDR
The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
A Scalable Parallel Sorting Algorithm Using Exact Splitting
TLDR
This paper presents the first parallel sorting algorithm to combine all herein before mentioned properties, while laying the foundations to overcome scalability problems for sorting data on the next generation of massively parallel systems.
Optimal and Practical Algorithms for Sorting on the PDM
TLDR
A randomized mergesort algorithm based on a simple idea that sorts using an asymptotically optimal number of I/O operations with high probability and has all of the desirable features for practical implementation is presented.
...
...

References

SHOWING 1-10 OF 37 REFERENCES
A Framework for Simple Sorting Algorithms on Parallel Disk Systems
TLDR
A simple parallel sorting algorithm is presented and it is proved that it can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni.
Deterministic distribution sort in shared and distributed memory multiprocessors
TLDR
An elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel diska and parallel memory hierarchies with both single and parallel processors and shows how to sort determiniatically in parallelMemory hierarchies.
Distribution sort with randomized cycle
TLDR
This paper proposes a simple variant of distribution sort called randomized cycling distribution sort (RCD) and proves that it has optimal expected I/O complexity and uses a novel reduction to a model with significantly fewer probabilistic interdependencies.
Fast Concurrent Access to Parallel Disks
TLDR
This work rehabilitate Aggarwal and Vitter's ``single-disk multi-head'' model that allows access to D arbitrary blocks in each I/ O step and shows that a shared buffer of O(D) blocks suffices to support efficient writing.
Columnsort lives! an efficient out-of-core sorting program
TLDR
To the best of the knowledge, the design and implementation of a parallel out-of-core sorting algorithm, which is based on Leighton's columnsort algorithm, are presented, and it is demonstrated that the implementation's sorting efficiency is competitive with that of NOW-Sort, a sorting algorithm developed to sort large amounts of data quickly on a cluster of workstations.
Near-Optimal Parallel Prefetching and Caching
TLDR
The authors consider algorithms for integrated prefetching and caching in a model with a fixed-size cache and any number of backing storage devices (disks) and produce a new algorithm, reverse aggressive, with near-optimal performance in the presence of multiple disks.
Optimal prefetching and caching for parallel I/O sytems
TLDR
This work shows that in the off-line case, where apriori knowledge of all the requests is available, SUPERVISOR performs the minimum number of I/Os to service the given I/O requests, which is the first parallel I-O scheduling algorithm that is provably offline optimal.
The input/output complexity of sorting and related problems
TLDR
Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Nsort: a Parallel Sorting Program for NUMA and SMP Machines
TLDR
Ordinal TM Nsort TM is a high-performance sort program for SGI IRIX, Sun Solaris and HP-UX servers that can use tens of processors and hundreds of disks to quickly sort and merge data.
Algorithms for parallel memory, I: Two-level memories
We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix
...
...