Learn More
Approximate computing has emerged as a new design paradigm that exploits the inherent error resilience of a wide range of application domains by allowing hardware implementations to forsake exact Boolean equivalence with algorithmic specifications. A slew of manual design techniques for approximate computing have been proposed in recent years, but very(More)
The deluge of data has inspired big-data processing frameworks that span across large clusters. Frameworks for MapReduce, a state-of-the-art programming model, have primarily made use of the CPUs in distributed systems, leaving out computationally powerful accelerators such as GPUs. This paper presents HeteroDoop, a MapReduce framework that employs both(More)
Accelerator-based heterogeneous computing is gaining momentum in High Performance Computing arena. However, the increased complexity of the accelerator architectures demands more generic, highlevel programming models. OpenACC is one such attempt to tackle the problem. While the abstraction endowed by OpenACC offers productivity, it raises questions on its(More)
Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programming models, limited device memory sizes and overheads of data transfers between CPU and accelerator memories are among the open challenges that restrict the widespread use of accelerators. First, this paper proposes a mechanism and an implementation to(More)
Computed Tomography (CT) Image Reconstruction is an important technique used in a wide range of applications, ranging from explosive detection, medical imaging to scientific imaging. Among available reconstruction methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images and allows for the use of more general CT scanner geometries(More)
Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains thousands of threads that occupy the entire GPU. GPUs face severe underutilization and their performance benefits vanish(More)