Empirical Study for Optimization of Power-Performance with On-Chip Memory

  title={Empirical Study for Optimization of Power-Performance with On-Chip Memory},
  author={Chikafumi Takahashi and Mitsuhisa Sato and Daisuke Takahashi and Taisuke Boku and Hiroshi Nakamura and Masaaki Kondo and Motonobu Fujita},
Power-performance (performance per uniform power consumption) recently has become a more important factor in modern highperformance microprocessors. In processor design, it is a well-known that off-chip memory access has a large impact on both performance and power consumption. On-chip memory is one solution for this problem, so that many processors such as the Renesas SH-4 and some ARM architecture type processors adopt on-chip memory, which resides on the same layer as the cache memory. In… 
Adaptive Page Migration Policy With Huge Pages in Tiered Memory Systems
This paper proposes a novel dynamic policy selection mechanism, which identifies the best migration policy for a given workload, and allows multiple concurrently running workloads to adopt different policies.
An approach to the next generation of computing: Small and inexpensive vision intelligence device
A system incorporating software and hardware which aims at two goals: to offer intelligent application development with state-of-the-art computer vision algorithms and to lower power consumption is proposed.
Étude et optimisation de l'interaction processeurs-architectures reconfigurables dynamiquement
Les applications de telecommunications mobiles et de multimedia, notamment dans le domaine de l'embarque, deviennent de plus en plus complexes au niveau calculatoire et consomment de plus en plus


Software-controlled on-chip memory for high-performance and low-power computing
A new processor architecture SCIMA (the abbreviation of Software Controlled Integrated Memory Architecture) is proposed which integrates software controllable memory (SCM) into a processor chip and its effectiveness is shown.
Data movement optimization for software-controlled on-chip memory
The results reveal that the proposed technique can drastically reduce memory stall cycles and achieve high performance.
The architecture of the DIVA processing-in-memory chip
The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory chips as smart-memory co-processors to a conventional microprocessor, and a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.
SCIMA: Software controlled integrated memory architecture for high performance computing
A new VLSI architecture called SCIMA is proposed which integrates software controllable memory into a processor chip and the evaluation results reveal the superiority ofSCIMA compared with conventional cache-based architecture.
A Case for Intelligent RAM: IRAM
This paper reviews the state of microprocessors and DRAMs today, explores some of the opportunities and challenges for IRAMs, and finally estimates performance and energy effi- ciency of three IRAM designs.
A case for intelligent RAM
The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.
A parallel processing chip with embedded DRAM macros
A combined DRAM and logic chip has been developed for massively parallel processing (MPP) applications that delivers 50-MIPS of performance at 2.7 W and contains eight 16-b CPUs and some broadcast logic circuits.
Sony’s Emotionally Charged Chip
  • Computer Science
  • 1999
The Emotion Engine, a multimedia processor that will be the heart of the next-generation PlayStation, upsets the traditional notion of a game processor and will bring Toy Story-like realism to home games, says SCE.
The Nas Parallel Benchmarks
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters that mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications.
Unlocking the Performance of the BlueGene/L Supercomputer
This paper demonstrates how benchmarks and applications can take advantage of the special dual floating-point unit on each processor and the ability to use two processors per node to get the most out of BlueGene/L.