GPUTeraSort: high performance graphics co-processor sorting for large database management
@article{Govindaraju2006GPUTeraSortHP, title={GPUTeraSort: high performance graphics co-processor sorting for large database management}, author={Naga K. Govindaraju and Jim Gray and Ritesh Kumar and Dinesh Manocha}, journal={Proceedings of the 2006 ACM SIGMOD international conference on Management of data}, year={2006} }
We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces on the same PC -- a fast and dedicated memory interface on the GPU along with the main memory interface for CPU…
Figures and Tables from this paper
487 Citations
Relational Query Co-Processing on Graphics Processors 1
- Computer Science
- 2009
This paper designs a set of highly optimized data-parallel primitives such as split and sort, and uses these primitives to implement common relational query processing algorithms that utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to reduce memory stalls.
Relational query coprocessing on graphics processors
- Computer ScienceTODS
- 2009
This article designs a set of highly optimized data-parallel primitives such as split and sort, and uses these primitives to implement common relational query processing algorithms on the GPU, and proposes coprocessing techniques that take into account both the computation resources and the GPU-CPU data transfer cost.
GPUMemSort: A High Performance Graphics Co-processors Sorting Algorithm for Large Scale In-Memory Data
- Computer Science
- 2011
The experimental results show that the in-core sorting can outperform other comparison-based algorithms and GPUMemSort is highly effective in sorting large-scale inmemory data.
Relational joins on graphics processors
- Computer ScienceSIGMOD Conference
- 2008
This work designs a set of data-parallel primitives such as split and sort, and uses these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins, and utilizes the high parallelism as well as the high memory bandwidth of the GPU.
Database compression on graphics processors
- Computer ScienceProc. VLDB Endow.
- 2010
This work implements nine lightweight compression schemes on the GPU and designs a compression planner to find the optimal combination, and demonstrates the feasibility of offloading compression and decompression to the GPU.
A Memory Model for Scientific Algorithms on Graphics Processors
- Computer ScienceACM/IEEE SC 2006 Conference (SC'06)
- 2006
A memory model is presented to analyze and improve the performance of scientific algorithms on graphics processing units (GPUs) and incorporates many characteristics of GPU architectures including smaller cache sizes, 2D block representations, and the 3C's model to analyze the cache misses.
Parallel H-Tree Based Data Cubing on Graphics Processors
- Computer ScienceInt. J. Softw. Informatics
- 2012
This paper investigates efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors.
Efficient Data Management for GPU Databases
- Computer Science
- 2012
It is demonstrated that GPU query acceleration is possible for data sets much larger than the size of GPU memory, and argued that the use of an opcode model of query execution combined with a simple virtual machine provides capabilities that are impossible with the parallel primitives used for most GPU database research.
High performance comparison-based sorting algorithm on many-core GPUs
- Computer Science2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
- 2010
A new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs), which mainly consists of a bitonic sort followed by a merge sort that achieves high performance by efficiently mapping the sorting tasks to GPU architectures.
GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity
- Computer ScienceIEEE Transactions on Big Data
- 2016
This work investigates applicability of using GPU devices to the splitter-based algorithms and extends HykSort, an existing splitter, by offloading costly computation phases to GPUs, and finds that the performance is mostly bottlenecked by the CPU-GPU host-to-device bandwidth.
References
SHOWING 1-10 OF 53 REFERENCES
Fast computation of database operations using graphics processors
- Computer ScienceSIGMOD '04
- 2004
New algorithms for performing fast computation of several common database operations on commodity graphics processors, taking into account some of the limitations of the programming model of current GPUs and performing no data rearrangements are presented.
The Graphics Card as a Stream Computer
- Computer Science
- 2003
Inspired in part by dataflow architectures and systolic arrays, the development of graphics chips focused on high computation throughput while sacrificing (to a degree) the generality of a CPU, what has resulted is a stream processor that is highly optimized for stream computations.
Fast and approximate stream mining of quantiles and frequencies using graphics processors
- Computer ScienceSIGMOD '05
- 2005
The results demonstrate that the graphics processors available on a commodity computer system are efficient stream-processor and useful co-processors for mining data streams.
Efficient relational database management using graphics processors
- Computer ScienceDaMoN '05
- 2005
It is shown that the GPUs can be used as a co-processor to accelerate many database and data mining queries and implement these algorithms on commodity GPUs and compare their performance with optimized CPU-based algorithms.
A super scalar sort algorithm for RISC processors
- Computer ScienceSIGMOD '96
- 1996
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
High-performance sorting on networks of workstations
- Computer ScienceSIGMOD '97
- 1997
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale…
Alphasort: A cache-sensitive parallel external sort
- Computer ScienceThe VLDB Journal
- 2005
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
External memory algorithms and data structures: dealing with massive data
- Computer ScienceCSUR
- 2001
The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Database Architecture Optimized for the New Bottleneck: Memory Access
- Computer ScienceVLDB
- 1999
A simple scan test is used to show the severe impact of main-memory access bottleneck, and radix algorithms for partitioned hash-join are introduced, using a detailed analytical model that incorporates memory access cost.
DBMSs on a Modern Processor: Where Does Time Go?
- Computer ScienceVLDB
- 1999
This paper examines four commercial DBMSs running on an Intel Xeon and NT 4.0 and introduces a framework for analyzing query execution time, and finds that database developers should not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues.