Energy Aware Loop Scheduling for High Performance Multi-Module Memory

@article{Qiu2009EnergyAL,
  title={Energy Aware Loop Scheduling for High Performance Multi-Module Memory},
  author={Meikang Qiu and Mei-qin Liu and Fei Hu and Shaobo Liu and Lingfeng Wang},
  journal={2009 Sixth IFIP International Conference on Network and Parallel Computing},
  year={2009},
  pages={16-22},
  url={https://api.semanticscholar.org/CorpusID:8176604}
}
  • Meikang QiuMei-qin Liu Lingfeng Wang
  • Published in 19 October 2009
  • Computer Science, Engineering
  • 2009 Sixth IFIP International Conference on Network and Parallel Computing
An efficient algorithm, EALSPP (Energy Aware Loop Scheduling with Prefetching and Partition), is proposed that attempts to maximize energy saving while hiding memory latency with the combination of loop scheduling, data prefetching, memory partition, and heterogeneous memory module type assignment.

Figures and Tables from this paper

Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications

A new partition scheduling algorithm called heterogeneous multiprocessor partition (HMP) based on the prefetching technique for heterogeneous multicore processors, which can hide memory latencies for applications with multi-dimensional loops.

Energy-aware variable partitioning and instruction scheduling for multibank memory architectures

This article proposes an algorithm to iteratively find the variable partition such that the maximum energy saving is achieved while satisfying the given performance constraint.

Loop scheduling and partitions for hiding memory latencies

    Fei ChenE. Sha
    Computer Science
  • 1999
This work studies the optimal partition shape and size so that a well balanced overall schedule can be obtained and shows that the proposed methodology consistently produces optimal or near optimal solutions.

Combining loop fusion with prefetching on shared-memory multiprocessors

For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%.

Automatic data migration for reducing energy consumption in multi-bank memory systems

An automatic data migration strategy which dynamically places the arrays with temporal affinity into the same set of banks is described which increases the number of banks which can be put into low-power modes and allows the use of more aggressive energy-saving modes.

Sequential Hardware Prefetching in Shared-Memory Multiprocessors

Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively.

Optimizing Overall Loop Schedules Using Prefetching and Partitioning

This paper studies the optimal partition shape and size so that a well-balanced overall schedule can be obtained and shows that the proposed methodology consistently produces optimal or near optimal solutions.

Impact of data transformations on memory bank locality

    M. Kandemir
    Computer Science
  • 2004
This paper presents a compiler-based data layout transformation strategy for increasing the effectiveness of a banked memory architecture, which is to transform the array layouts in memory in such a way that two loop iterations executed one after another access the data in the same bank as much as possible.

Tolerating latency in multiprocessors through compiler-inserted prefetching

The proposed algorithm attempts to minimize overheads by only issuing prefetches for references that are predicted to suffer cache misses, and can improve the speed of some parallel applications by as much as a factor of two.

A performance study of software and hardware data prefetching schemes

Qualitative comparisons indicate that both schemes are able to reduce cache misses in the domain of linear array references, and an approach combining software and hardware schemes is proposed; it shows promise in reducing the memory latency with least overhead.