• Corpus ID: 8623033

Quasi-ASICs: Trading Area for Energy by Exploiting Similarity in Synthesized Cores for Irregular Code

  title={Quasi-ASICs: Trading Area for Energy by Exploiting Similarity in Synthesized Cores for Irregular Code},
  author={Ganesh Venkatesh and Jack Sampson and Nathan Goulding and S Michael and Bedford Taylor},
The transistor density continues to increase exponentially, but the power dissipation per transistor improves only slightly with each generation of Moore’s law. Given the constant chip-level power budgets, this exponentially decreases the fraction of the transistors that can be active simultaneously with each technology generation. Hence, while the area budget continues to increase exponentially, the power budget has become a first-order design constraint in current processors. In this regime… 
1 Citations

Configurable energy-efficient co-processors to scale the utilization wall

This thesis proposes Patchable Conservation Cores, flexible, energy-efficient co-processors that contain the ability to be patched, enabling them to remain useful across versions of their target application, to demonstrate the effectiveness of these conservation cores in targeting a system workload.



Conservation cores: reducing the energy of mature computations

A toolchain for automatically synthesizing c-cores from application source code is presented and it is demonstrated that they can significantly reduce energy and energy-delay for a wide range of applications, and patching can extend the useful lifetime of individual c-Cores to match that of conventional processors.

Understanding sources of inefficiency in general-purpose chips

The sources of these performance and energy overheads in general-purpose processing systems are explored by quantifying the overheads of a 720p HD H.264 encoder running on a general- Purpose CMP system and exploring methods to eliminate these overheads by transforming the CPU into a specialized system for H. 264 encoding.

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

The model considers the less-obvious relationships between conventional processors and a diverse set of U-cores to understand the relative merits between different approaches in the face of technology constraints, and supports speculation of future designs from scaling trends predicted by the ITRS road map.

Bridging the computation gap between programmable processors and hardwired accelerators

This paper proposes a customized semi-programmable loop accelerator architecture that exploits the efficiency gains available through high levels of customization, while maintaining sufficient flexibility to execute multiple similar loops.

Processor acceleration through automated instruction set customization

This paper presents the design of a system to automate the instruction set customization process, which contains a dataflow graph design space exploration engine that efficiently identifies profitable computation subgraphs from which to create custom hardware, without artificially constraining their size or shape.

Reconciling specialization and flexibility through compound circuits

This article presents a method for achieving any desired balance between flexibility and efficiency by automatically combining any set of individual customization circuits into a larger compound circuit, which is significantly more cost efficient than the simple union of all target circuits.

Hardware/software instruction set configurability for system-on-chip processors

This paper describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description, and describes two groups of benchmarks that show 20 to 40 times acceleration of a broad set of algorithms through application-specific instruction set extensions, relative to high performance RISC processors.

Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Computation Spreading (CSP) is presented, which employs hardware migration to distribute a thread's dissimilar fragments of computation across the multiple processing cores of a chip multiprocessor (CMP), while grouping similar computation fragments from different threads together.

PICO: Automatically Designing Custom Computers

The paper discusses the PICO (program in, chip out) project, a long-range HP Labs research effort that aims to automate the design of optimized, application-specific computing systems - thus enabling

CACTI 5 . 1

The CACTI code base has been extensively rewritten to become more modular to enable fair comparisons of SRA M and DRAM technology and various circuit assumptions have been updated to be more relevant to modern design practice.