The input/output complexity of sorting and related problems

@article{Aggarwal1988TheIC,
  title={The input/output complexity of sorting and related problems},
  author={Alok Aggarwal and Jeffrey Scott Vitter},
  journal={Commun. ACM},
  year={1988},
  volume={31},
  pages={1116-1127}
}
We provide tight upper and lower bounds, up to a constant factor, for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition. The bounds hold both in the worst case and in the average case, and in several situations the constant factors match. Secondary storage is modeled as a magnetic disk capable of transferring P… 

Figures from this paper

Algorithms for parallel memory, I: Two-level memories
We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix
Lower bounds for external memory integer sorting via network coding
TLDR
A tight conditional lower bound on the complexity of external memory sorting of integers is presented, based on a famous conjecture in network coding by Li and Li, who conjectured that network coding cannot help anything beyond the standard multicommodity flow rate in undirected graphs.
Sequence sorting in secondary storage
TLDR
The results show, somewhat counterintuitively, that the I/O complexity of string sorting depends upon the length of the strings relative to the block size.
Large-scale sorting in parallel memories (extended abstract)
TLDR
An elegant, easy-toimplement, optimal, deterministic algorithm for external sorting with P disk drives is presented, which answers the open problem posed by Vitter and Shriver.
Lower bounds for external memory integer sorting via network coding
TLDR
A tight conditional lower bound on the complexity of external memory sorting of integers is presented, based on a famous conjecture in network coding by Li and Li (2004), who conjectured that network coding cannot help anything beyond the standard multicommodity flow rate in undirected graphs.
A Framework for Simple Sorting Algorithms on Parallel Disk Systems
TLDR
A simple parallel sorting algorithm is presented and it is proved that it can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni.
Optimal and Practical Algorithms for Sorting on the PDM
TLDR
A randomized mergesort algorithm based on a simple idea that sorts using an asymptotically optimal number of I/O operations with high probability and has all of the desirable features for practical implementation is presented.
Algorithms and Data Structures for External Memory
  • J. Vitter
  • Computer Science
    Found. Trends Theor. Comput. Sci.
  • 2006
TLDR
The state of the art in the design and analysis of algorithms and data structures for external memory (or EM for short), where the goal is to exploit locality and parallelism in order to reduce the I/O costs is surveyed.
External memory algorithms and data structures: dealing with massive data
TLDR
The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Efficient bundle sorting
TLDR
An efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) logM/Bk disk accesses, and is shown to be optimal by proving a matching lower bound for bundling together identical keys.
...
...

References

SHOWING 1-10 OF 19 REFERENCES
Tight Bounds on the Complexity of Parallel Sorting
  • F. Leighton
  • Computer Science, Mathematics
    IEEE Transactions on Computers
  • 1985
TLDR
Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network.
The I/O Performance of Multiway Mergesort and Tag Sort
TLDR
These models of secondary storage are developed to evaluate external sorting and use them to analyze the average I/O access time of mergesort and tag sort on files with uniform key distribution and it is shown that for large files tag sort takes asymptotically less I/W time than mergesorts.
I/O complexity: The red-blue pebble game
TLDR
Using the red-blue pebble game formulation, a number of lower bound results for the I/O requirement are proven and may provide insight into the difficult task of balancing I/o and computation in special-purpose system designs.
The Design and Analysis of BucketSort for Bubble Memory Secondary Storage
TLDR
A hypothetical Bucket-Sort implementation that uses bubble memory is described and a new software marking technique is introduced that reduces the effective time for an associative search.
Time Bounds for Selection
The Universality of the Shuffle-Exchange Network
TLDR
The inherent relationship between the shuffle-exchange network and the Benes binary network is specified so that designers can have a broad prospect.
Permuting info;mati& in ideaiized'two-:evel storage
  • Complexity of Computer Calculations
  • 1972
The Art of Computer Programming, Volume III: Sorting and Searching
Parallelism in space-time tradeoffs. In Advances in Computing Research, Volume 4: Special issue on Parallel and Disfributed Computing
  • 1987
...
...