Learn More
This paper considers Rigel, a programmable accelerator architecture for a broad class of data- and task-parallel computation. Rigel comprises 1000+ hierarchically-organized cores that use a fine-grained, dynamically scheduled single-program, multiple-data (SPMD) execution model. Rigel's low-level programming interface adopts a single global address space(More)
We present GoldMine, a methodology for generating assertions automatically. Our method involves a combination of data mining and static analysis of the Register Transfer Level (RTL) design. We present results of using GoldMine for assertion generation of the RTL of a 1000-core processor design that is still in an evolving stage. Our results show that(More)
Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on(More)
This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors.(More)
—GPUs employ massive multithreading and fast context switching to provide high throughput and hide memory latency. Multithreading can increase contention for various system resources, however, that may result in suboptimal utilization of shared resources. Previous research has proposed variants of throttling thread-level parallelism to reduce cache(More)
Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer vision. These applications are ideally suited for accelerators because of their parallelizability and demand for high throughput. We compile a(More)
Two broad classes of memory models are available today: models with hardware cache coherence, used in conventional chip multiprocessors, and models that rely upon software to manage coherence, found in compute accelerators. In some systems, both types of models are supported using disjoint address spaces and/or physical memories. In this paper we present(More)
Modern graphics processing units (GPUs) employ a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complex thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on(More)
The idea that type 2 diabetes is associated with augmented innate immune function characterized by increased circulating levels of acute phase reactants and altered macrophage biology is fairly well established, even though the mechanisms involved in this complex interaction still are not entirely clear. To date, the majority of studies investigating innate(More)