• Corpus ID: 61300651

A Low-Power Hybrid CPU-GPU Sort

  title={A Low-Power Hybrid CPU-GPU Sort},
  author={Lawrence Tan},
This thesis analyses the energy efficiency of a low-power CPU-GPU hybrid architecture. We evaluate the NVIDIA Ion architecture, which couples an Intel Atom low power processor with an integrated GPU that has an order of magnitude fewer processors compared to traditional discrete GPUs. We attempt to create a system that balances computation and I/O capabilities by attaching flash storage that allows sequential access to data with very high throughput. To evaluate this architecture, we… 


Designing efficient sorting algorithms for manycore GPUs
The design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA, are described, which are the fastest GPU sort and the fastest comparison-based sort reported in the literature.
GPUTeraSort: high performance graphics co-processor sorting for large database management
Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.
Low Power Amdahl Blades for Data Intensive Computing
The emergence of Solid State Disk (SSD) technology, poses the challenge of building a credible equivalent to a GrayWulf system with a similar IO performance, but with considerably lower power
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
The paper presents an exhaustive analysis of the design space of Gordon systems, focusing on the trade-offs between power, energy, and performance that Gordon must make, and describes a novel flash translation layer tailored to data intensive workloads and large flash storage arrays.
Delivering Energy Proportionality with Non Energy-Proportional Systems - Optimizing the Ensemble
This paper demonstrates how optimization-based techniques can be used to build systems with off-the-shelf hardware that, when viewed at the aggregate level, approximate the behavior of energy-proportional systems.
FAWN: a fast array of wimpy nodes
The key contributions of this paper are the principles of the FAWN architecture and the design and implementation of FAWN-KV--a consistent, replicated, highly available, and high-performance key-value storage system built on a FAWN prototype.
JouleSort: a balanced energy-efficiency benchmark
This work proposes and motivate JouleSort, an external sort benchmark, for evaluating the energy efficiency of a wide range of computer systems from clusters to handhelds, and demonstrates a Joule sort system that is over 3.5x as energy-efficient as last year's estimated winner.
Performance / Price Sort and PennySort
This paper documents this and proposes that the PennySort benchmark be revised to Performance/Price sort: a simple GB/$ sort metric based on a two-pass external sort.
NAS Parallel Benchmarks
  • D. Bailey
  • Computer Science
    Encyclopedia of Parallel Computing
  • 2011
The original NAS Parallel Benchmarks consisted of eight individual bench- mark problems, each of which focused on some aspect of scientific computing, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world computing applications.
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
A new solution that incorporates volume non-server-class components in novel packaging solutions, with memory sharing and flash-based disk caching, has promise, with a 2X improvement on average in performance-per-dollar for the benchmark suite.