Alphasort: A cache-sensitive parallel external sort

@article{Nyberg2005AlphasortAC,
  title={Alphasort: A cache-sensitive parallel external sort},
  author={Christian Nyberg and Tom Barclay and Zarka Cvetanovic and Jim Gray and David B. Lomet},
  journal={The VLDB Journal},
  year={2005},
  volume={4},
  pages={603-627}
}
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using commodity processors, memory, and arrays of SCSI disks, AlphaSort runs the industrystandard sort benchmark in seven seconds. This beats the best published record on a 32-CPU 32-disk Hypercube by 8:1. On another benchmark, AlphaSort sorted more than a gigabyte in one minute. AlphaSort is a cache-sensitive, memoryintensive sort algorithm. We argue that modern… 
Leyenda: An Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory
TLDR
This paper proposes Leyenda, a hybrid, parallel and efficient Radix Most-Significant-Bit (MSB) MergeSort algorithm, with utilization of local thread-level CPU cache and efficient disk/memory I/O, and finds Leyenda to outperform GNU's parallel in-memory quick/merge sort implementations by up to three times.
Cache craftiness for fast multicore key-value storage
TLDR
This work presents Masstree, a fast key-value database designed for SMP machines, which is comparable to that of memcached, a non-persistent hash table server, and higher than that of VoltDB, MongoDB, and Redis.
Sorting Hierarchical Data in External Memory
TLDR
This paper proposes HErMeS, an algorithm that generalizes the most widely-used techniques for sorting flat data in external memory, namely multiway merge-sort and replacement selection, and efficiently takes into consideration the hierarchical nature of the data in order to minimize the number of disk accesses and optimize the usage of available memory.
TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System
TLDR
This article describes the hardware and software architecture necessary to operate TritonSort, a highly efficient, scalable sorting system designed to process large datasets, and is able to sort data at approximately 80% of the disks’ aggregate sequential write speed.
External Memory Sort On CGM 1 Clusters
TLDR
This paper adapted HPVM MinuteSort, and borrowed the THsort idea to develop their external memory sort algorithm, which minimizes the I/O and communication costs.
TritonSort: A Balanced Large-Scale Sorting System
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in
CellSort: High Performance Sorting on the Cell Processor
TLDR
This paper design and implementation of CellSort - a high performance distributed sort algorithm for the Cell processor as a distributed bitonic merge with a data-parallel bitonic sorting kernel, and results show that, when properly implemented, distributed out-of-core bitonic sort on Cell can significantly outperform the asymptotically superior quick sort.
Sorting in a Memory Hierarchy with Flash Memory
  • G. Graefe
  • Computer Science
    Datenbank-Spektrum
  • 2011
TLDR
The present paper analyzes the use of flash memory for database query processing including algorithms that combine flash memory and traditional disk drives including external merge sort, which serves as a prototypical query execution algorithm.
GPUTeraSort: high performance graphics co-processor sorting for large database management
TLDR
Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.
Designing Database Operators for Flash-enabled Memory Hierarchies
TLDR
This paper uses external merge sort as a prototypical query execution algorithm to demonstrate that the most advantageous external sort algorithms combine flash memory and traditional disk, exploiting the fast access latency of flash memory as well as the fast transfer bandwidth and inexpensive capacity of traditional disks.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
AlphaSort: a RISC machine sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
Sorting Large Files on a Backend Multiprocessor
TLDR
The results show that using current, off-the-shelf technology coupled with a streamlined distributed operating system, three- and five-microprocessor configurations, provide a very cost-effective sort of large files.
Tuning a parallel database algorithm on a shared‐memory multiprocessor
TLDR
Volcano's parallel external sorting algorithm and a sequence of enhancements to improve its performance are presented, and very good absolute performance is obtained, 84 seconds for 100 MB of data, as well as near‐linear speedup with sixteen CPUs and disks.
FASTSORT: AN EXTERNAL SORT USING PARALLEL PROCESSING
TLDR
Performance measurements of FastSort are presented on various Tandem Nonstop processors, with particular emphasis on the speedup obtained by using parallelism to sort large files.
A Low Communication Sort Algorithm for a Parallel Database Machine
TLDR
This work proposes a novel algorithm that exhibits complete parallelism during the sort, merge, and return-tohost phases, and decreases the amou@ of inter-processor communication compared to existing parallel sort algorithms.
Parallel sorting on a shared-nothing architecture using probabilistic splitting
  • D. DeWitt, J. Naughton, D. Schneider
  • Computer Science
    [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems
  • 1991
TLDR
The authors consider the problem of external sorting in a shared-nothing multiprocessor with two techniques for determining ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles.
Design, analysis, and implementation of parallel external sorting algorithms
TLDR
A modified merge-sort is proposed to use as a method for eliminating duplicate records in a large file and a combinatorial model is developed to provide an accurate estimate for the cost of the duplicate elimination operation (both in the serial and the parallel cases).
Improving Quicksort Performance with a Codewort Data Structure
TLDR
It is shown how the ordering of keys is preserved by an adequate choice of the code generator and how this can be applied to the quicksort algorithm.
Parallel Partition Sort for Database Machines
TLDR
A new parallel sorting method, called a parallel partition sort, which transfers only a small amount of data and does not place large demands on the CPU is discussed, based on the top-down partitioning of data.
Characterization of alpha AXP performance using TP and SPEC workloads
TLDR
A simple model for evaluating the effects of various design tradeoffs based on the data collected by using hardware monitors is proposed and indicates that Alpha AXP takes advantage of lower cycles per instruction and cycle time to achieve a significant performance advantage.
...
1
2
3
4
5
...