High-performance sorting on networks of workstations

@inproceedings{ArpaciDusseau1997HighperformanceSO,
  title={High-performance sorting on networks of workstations},
  author={Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau and David E. Culler and Joseph M. Hellerstein and David A. Patterson},
  booktitle={SIGMOD '97},
  year={1997}
}
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW. [] Key MethodOur implementations can be applied to a variety of disk, memory, and processor configurations; we highlight salient issues for tuning each component of the system. We evaluate the use of commodity operating systems and hardware for parallel sorting. We find existing OS primitives for memory management and file access adequate. Due to aggregate communication and disk bandwidth…

Figures and Tables from this paper

Sorting on a Cluster Attached to a Storage-Area Network
In November 2004, the SAN Cluster Sort program (SCS) set new records for the Indy versions of the Minute and TeraByte Sorts. SCS ran on a cluster of 40 dual-processor Itanium2 nodes on the show floor
Distribution-Insensitive Parallel External Sorting on PC Clusters
TLDR
This paper presents two distribution- insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance.
The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs
TLDR
It is found that the architectures studied are not well balanced for streaming I/O applications, and the clustered workstations provide higher absolute performance for streamingI/O workloads.
A simple and efficient parallel disk mergesort
TLDR
The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice.
SPsort: How to Sort a Terabyte Quickly
In December 1998, a 488 node IBM RS/6000 SP sorted a terabyte of data (10 billion 100 byte records) in 17 minutes, 37 seconds. This is more than 2.5 times faster than the previous record for a
A Simple and Efficient Parallel Disk Mergesort
TLDR
The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort and multiway partitioning of a file into several other files.
Columnsort lives! an efficient out-of-core sorting program
TLDR
To the best of the knowledge, the design and implementation of a parallel out-of-core sorting algorithm, which is based on Leighton's columnsort algorithm, are presented, and it is demonstrated that the implementation's sorting efficiency is competitive with that of NOW-Sort, a sorting algorithm developed to sort large amounts of data quickly on a cluster of workstations.
A New Computation Model for Rack-Based Computing
TLDR
This study focuses on communication among processes and processing time costs, both total and elapsed, and shows tradeoffs among them depending on the computational limits the authors invoke on the processes.
A Simple and EÆcient Parallel Disk Mergesort
TLDR
An eÆcient implementation of SRM, based upon novel and elegant data structures, and a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and technique for bu er management are given.
TritonSort: A Balanced Large-Scale Sorting System
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
A case for NOW (networks of workstation)
TLDR
This paper identifies three opportunities for NOWs that will benefit end- users: dramatically improving virtual memory and file system performance by using the aggregate DRAM of a NOW as a giant cache for disk; achieving cheap, highly available, and scalable file storage by using redundant arrays of workstation disks, and enterprise-scale network file systems.
Sorting Large Files on a Backend Multiprocessor
TLDR
The results show that using current, off-the-shelf technology coupled with a streamlined distributed operating system, three- and five-microprocessor configurations, provide a very cost-effective sort of large files.
AlphaSort: a RISC machine sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
Fast Parallel Sorting Under LogP: Experience with the CM-5
TLDR
The LogP model is shown to be a valuable guide in the development of parallel algorithms and a good predictor of implementation performance; the model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention.
A practical external sort for shared disk MPPs
TLDR
The implementation of the sample sort algorithm described here meets the requirements of real world constraints and is suitable for shared disk MPP computer systems.
A super scalar sort algorithm for RISC processors
TLDR
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
A practical external sort for shared disk MPP's
TLDR
The implementation of the sample sort algorithm described here meets the requirements of real world constraints and is suitable for shared disk MPP computer systems.
The interaction of parallel and sequential workloads on a network of workstations
TLDR
This paper examines the plausibility of using a network of workstations (NOW) for a mixture of parallel and sequential jobs, and presents a methodology for deriving an optimal delay time for recruiting idle machines for use by parallel programs.
A comparison of sorting algorithms for the connection machine CM-2
TLDR
A fast sorting algorithm for the Connection Machine Supercomputer model CM-2 is developed and it is shown that any U(lg n)-depth family of sorting networks can be used to sort n numbers in U( lg n) time in the bounded-degree fixed interconnection network domain.
A Case for NOW (Networks Of Workstations)
TLDR
The 100-node NOW prototype aims to demonstrate practical solutions to the challenges of efficient communication hardware and software, global coordination of multiple workstation operating systems, and enterprise-scale network file systems.
...
1
2
3
4
5
...