Learn More
Exact inference is a key problem in exploring proba-bilistic graphical models, where the computational complexity varies dramatically as the parameters of the graphical models changes. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we design an efficient scheduler hosted by the CPU to allocate cliques in(More)
This paper presents Warped-Compression, a warp-level register compression scheme for reducing GPU power consumption. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread registers is small. Removing data redundancy of register values(More)
—Large graph processing is now a critical component of many data analytics. Graph processing is used from social networking web sites that provide context-aware services from user connectivity data to medical informatics that diagnose a disease from a given set of symptoms. Graph processing has several inherently parallel computation steps interspersed with(More)
The Flash Translation Layer (FTL) is the core engine for Solid State Disks (SSD). It is responsible for managing the virtual to physical address mappings and emulating the functionality of a normal block-level device. SSD performance is highly dependent on the design of the FTL. For the last few years, several FTL schemes have been proposed. Hybrid FTL(More)
To support massive number of parallel thread contexts, Graphics Processing Units (GPUs) use a huge register file, which is responsible for a large fraction of GPU's total power and area. The conventional belief is that a large register file is inevitable for accommodating more parallel thread contexts, and technology scaling makes it feasible to incorporate(More)
As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of computing resources to support thread-level parallelism. But even with the best effort, exposing massive thread-level parallelism from a single GPU kernel, particularly from general purpose applications, is going to be a difficult challenge. In some cases, even if there is(More)