AlphaSort: a RISC machine sort

@inproceedings{Nyberg1994AlphaSortAR,
  title={AlphaSort: a RISC machine sort},
  author={Christian Nyberg and Tom Barclay and Zarka Cvetanovic and Jim Gray and David B. Lomet},
  booktitle={SIGMOD '94},
  year={1994}
}
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using Alpha AXP processors, commodity memory, and arrays of SCSI disks, AlphaSort runs the industry-standard sort benchmark in seven seconds. This beats the best published record on a 32-cpu 32-disk Hypercube by 8:1. On another benchmark, AlphaSort sorted more than a gigabyte in a minute. AlphaSort is a cache-sensitive memory-intensive sort algorithm. It uses file… 

Figures and Tables from this paper

Alphasort: A cache-sensitive parallel external sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
A super scalar sort algorithm for RISC processors
TLDR
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
High-performance sorting on networks of workstations
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale
Nsort: a Parallel Sorting Program for NUMA and SMP Machines
TLDR
Ordinal TM Nsort TM is a high-performance sort program for SGI IRIX, Sun Solaris and HP-UX servers that can use tens of processors and hundreds of disks to quickly sort and merge data.
Multithreaded architectures and the sort benchmark
TLDR
This paper considers how algorithms designed specifically for newer architectures (featuring simultaneous multithreading (SMT), symmetric multiprocessors (SMP), advanced memory units, chip multiprocesors (CMP), etc.
Super Scalar Sample Sort
TLDR
The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions, which facilitates optimizations like loop unrolling and software pipelining.
A High Speed Disk-to-disk Sort on a Windows Nt Cluster Running Hpvm
We describe the porting, redesign, and tuning of a high performance disk-to-disk parallel sort on a general purpose Myrinet connected PC cluster running Windows NT. This cluster employs the high
Active Disk Architecture for Databases
TLDR
This paper discusses how to map all the basic database operations - select, project, and join - onto an Active Disk system, and demonstrates a factor of 2x performance improvement on a small system using a portion of the TPC-D decision support benchmark.
Performance / Price Sort and PennySort
TLDR
This paper documents this and proposes that the PennySort benchmark be revised to Performance/Price sort: a simple GB/$ sort metric based on a two-pass external sort.
NOW-Sort : A Scalable , Commodity-Workstation Sort
TLDR
In this preliminary study, it is found that NOWs are competitive to the large-scale SMPs that are usually dominant in the sorting arena, and operating system support for managing memory and interacting with the file system is in place.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Sorting Large Files on a Backend Multiprocessor
TLDR
The results show that using current, off-the-shelf technology coupled with a streamlined distributed operating system, three- and five-microprocessor configurations, provide a very cost-effective sort of large files.
A Low Communication Sort Algorithm for a Parallel Database Machine
TLDR
This work proposes a novel algorithm that exhibits complete parallelism during the sort, merge, and return-tohost phases, and decreases the amou@ of inter-processor communication compared to existing parallel sort algorithms.
Parallel sorting on a shared-nothing architecture using probabilistic splitting
  • D. DeWitt, J. Naughton, D. Schneider
  • Computer Science
    [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems
  • 1991
TLDR
The authors consider the problem of external sorting in a shared-nothing multiprocessor with two techniques for determining ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles.
Tuning a parallel database algorithm on a shared‐memory multiprocessor
TLDR
Volcano's parallel external sorting algorithm and a sequence of enhancements to improve its performance are presented, and very good absolute performance is obtained, 84 seconds for 100 MB of data, as well as near‐linear speedup with sixteen CPUs and disks.
Parallel Partition Sort for Database Machines
TLDR
A new parallel sorting method, called a parallel partition sort, which transfers only a small amount of data and does not place large demands on the CPU is discussed, based on the top-down partitioning of data.
Design, analysis, and implementation of parallel external sorting algorithms
TLDR
A modified merge-sort is proposed to use as a method for eliminating duplicate records in a large file and a combinatorial model is developed to provide an accurate estimate for the cost of the duplicate elimination operation (both in the serial and the parallel cases).
Improving Quicksort Performance with a Codewort Data Structure
TLDR
It is shown how the ordering of keys is preserved by an adequate choice of the code generator and how this can be applied to the quicksort algorithm.
Characterization of alpha AXP performance using TP and SPEC workloads
TLDR
A simple model for evaluating the effects of various design tradeoffs based on the data collected by using hardware monitors is proposed and indicates that Alpha AXP takes advantage of lower cycles per instruction and cycle time to achieve a significant performance advantage.
Sorting Large Data Files on POOMA
TLDR
The results show that the benchmark is able to exploit the full capabilities of the computing power, the storage devices and the communication bandwith and the applicability of the POOMA platform for this application, even where the POOL implementation was, at the time of the experiment, far from optimal.
A measure of transaction processing power
TLDR
These benchmarks measure the performance of diverse transaction processing systems and a standard system cost measure is stated and used to define price/performance metrics.
...
1
2
3
4
...