Energy Aware Loop Scheduling for High Performance Multi-Module Memory
@article{Qiu2009EnergyAL, title={Energy Aware Loop Scheduling for High Performance Multi-Module Memory}, author={Meikang Qiu and Mei-qin Liu and Fei Hu and Shaobo Liu and Lingfeng Wang}, journal={2009 Sixth IFIP International Conference on Network and Parallel Computing}, year={2009}, pages={16-22}, url={https://api.semanticscholar.org/CorpusID:8176604} }
An efficient algorithm, EALSPP (Energy Aware Loop Scheduling with Prefetching and Partition), is proposed that attempts to maximize energy saving while hiding memory latency with the combination of loop scheduling, data prefetching, memory partition, and heterogeneous memory module type assignment.
Topics
Loop Scheduling (opens in a new tab)Energy-aware (opens in a new tab)Computing Systems (opens in a new tab)Memory Module (opens in a new tab)Partitions (opens in a new tab)Prefetching (opens in a new tab)Assignment Problem (opens in a new tab)Data Prefetching (opens in a new tab)Cell Processor (opens in a new tab)
One Citation
Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications
- 2016
Computer Science, Engineering
A new partition scheduling algorithm called heterogeneous multiprocessor partition (HMP) based on the prefetching technique for heterogeneous multicore processors, which can hide memory latencies for applications with multi-dimensional loops.
26 References
Loop scheduling and bank type assignment for heterogeneous multi-bank memory
- 2009
Computer Science, Engineering
Energy-aware variable partitioning and instruction scheduling for multibank memory architectures
- 2005
Computer Science, Engineering
This article proposes an algorithm to iteratively find the variable partition such that the maximum energy saving is achieved while satisfying the given performance constraint.
Loop scheduling and partitions for hiding memory latencies
- 1999
Computer Science
This work studies the optimal partition shape and size so that a well balanced overall schedule can be obtained and shows that the proposed methodology consistently produces optimal or near optimal solutions.
Combining loop fusion with prefetching on shared-memory multiprocessors
- 1997
Computer Science
For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%.
Automatic data migration for reducing energy consumption in multi-bank memory systems
- 2002
Computer Science, Engineering
An automatic data migration strategy which dynamically places the arrays with temporal affinity into the same set of banks is described which increases the number of banks which can be put into low-power modes and allows the use of more aggressive energy-saving modes.
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
- 1995
Computer Science
Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively.
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
- 2000
Computer Science
This paper studies the optimal partition shape and size so that a well-balanced overall schedule can be obtained and shows that the proposed methodology consistently produces optimal or near optimal solutions.
Impact of data transformations on memory bank locality
- 2004
Computer Science
This paper presents a compiler-based data layout transformation strategy for increasing the effectiveness of a banked memory architecture, which is to transform the array layouts in memory in such a way that two loop iterations executed one after another access the data in the same bank as much as possible.
Tolerating latency in multiprocessors through compiler-inserted prefetching
- 1998
Computer Science
The proposed algorithm attempts to minimize overheads by only issuing prefetches for references that are predicted to suffer cache misses, and can improve the speed of some parallel applications by as much as a factor of two.
A performance study of software and hardware data prefetching schemes
- 1994
Computer Science
Qualitative comparisons indicate that both schemes are able to reduce cache misses in the domain of linear array references, and an approach combining software and hardware schemes is proposed; it shows promise in reducing the memory latency with least overhead.