GPUTeraSort: high performance graphics co-processor sorting for large database management

@article{Govindaraju2006GPUTeraSortHP,
  title={GPUTeraSort: high performance graphics co-processor sorting for large database management},
  author={Naga K. Govindaraju and Jim Gray and Ritesh Kumar and Dinesh Manocha},
  journal={Proceedings of the 2006 ACM SIGMOD international conference on Management of data},
  year={2006}
}
  • N. Govindaraju, J. Gray, D. Manocha
  • Published 27 June 2006
  • Computer Science
  • Proceedings of the 2006 ACM SIGMOD international conference on Management of data
We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces on the same PC -- a fast and dedicated memory interface on the GPU along with the main memory interface for CPU… 
Relational Query Co-Processing on Graphics Processors 1
TLDR
This paper designs a set of highly optimized data-parallel primitives such as split and sort, and uses these primitives to implement common relational query processing algorithms that utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to reduce memory stalls.
Relational query coprocessing on graphics processors
TLDR
This article designs a set of highly optimized data-parallel primitives such as split and sort, and uses these primitives to implement common relational query processing algorithms on the GPU, and proposes coprocessing techniques that take into account both the computation resources and the GPU-CPU data transfer cost.
GPUMemSort: A High Performance Graphics Co-processors Sorting Algorithm for Large Scale In-Memory Data
TLDR
The experimental results show that the in-core sorting can outperform other comparison-based algorithms and GPUMemSort is highly effective in sorting large-scale inmemory data.
Relational joins on graphics processors
TLDR
This work designs a set of data-parallel primitives such as split and sort, and uses these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins, and utilizes the high parallelism as well as the high memory bandwidth of the GPU.
Database compression on graphics processors
TLDR
This work implements nine lightweight compression schemes on the GPU and designs a compression planner to find the optimal combination, and demonstrates the feasibility of offloading compression and decompression to the GPU.
A Memory Model for Scientific Algorithms on Graphics Processors
TLDR
A memory model is presented to analyze and improve the performance of scientific algorithms on graphics processing units (GPUs) and incorporates many characteristics of GPU architectures including smaller cache sizes, 2D block representations, and the 3C's model to analyze the cache misses.
Parallel H-Tree Based Data Cubing on Graphics Processors
TLDR
This paper investigates efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors.
Efficient Data Management for GPU Databases
TLDR
It is demonstrated that GPU query acceleration is possible for data sets much larger than the size of GPU memory, and argued that the use of an opcode model of query execution combined with a simple virtual machine provides capabilities that are impossible with the parallel primitives used for most GPU database research.
High performance comparison-based sorting algorithm on many-core GPUs
TLDR
A new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs), which mainly consists of a bitonic sort followed by a merge sort that achieves high performance by efficiently mapping the sorting tasks to GPU architectures.
GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity
TLDR
This work investigates applicability of using GPU devices to the splitter-based algorithms and extends HykSort, an existing splitter, by offloading costly computation phases to GPUs, and finds that the performance is mostly bottlenecked by the CPU-GPU host-to-device bandwidth.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
Fast computation of database operations using graphics processors
TLDR
New algorithms for performing fast computation of several common database operations on commodity graphics processors, taking into account some of the limitations of the programming model of current GPUs and performing no data rearrangements are presented.
The Graphics Card as a Stream Computer
TLDR
Inspired in part by dataflow architectures and systolic arrays, the development of graphics chips focused on high computation throughput while sacrificing (to a degree) the generality of a CPU, what has resulted is a stream processor that is highly optimized for stream computations.
Fast and approximate stream mining of quantiles and frequencies using graphics processors
TLDR
The results demonstrate that the graphics processors available on a commodity computer system are efficient stream-processor and useful co-processors for mining data streams.
Efficient relational database management using graphics processors
TLDR
It is shown that the GPUs can be used as a co-processor to accelerate many database and data mining queries and implement these algorithms on commodity GPUs and compare their performance with optimized CPU-based algorithms.
A super scalar sort algorithm for RISC processors
TLDR
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
High-performance sorting on networks of workstations
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale
Alphasort: A cache-sensitive parallel external sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
External memory algorithms and data structures: dealing with massive data
TLDR
The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Database Architecture Optimized for the New Bottleneck: Memory Access
TLDR
A simple scan test is used to show the severe impact of main-memory access bottleneck, and radix algorithms for partitioned hash-join are introduced, using a detailed analytical model that incorporates memory access cost.
DBMSs on a Modern Processor: Where Does Time Go?
TLDR
This paper examines four commercial DBMSs running on an Intel Xeon and NT 4.0 and introduces a framework for analyzing query execution time, and finds that database developers should not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues.
...
1
2
3
4
5
...