A case of system-wide power management for scientific applications

  title={A case of system-wide power management for scientific applications},
  author={Zhuo Liu and Jay F. Lofstead and Teng Wang and Weikuan Yu},
  journal={2013 IEEE International Conference on Cluster Computing (CLUSTER)},
  • Zhuo Liu, J. Lofstead, Weikuan Yu
  • Published 1 September 2013
  • Computer Science
  • 2013 IEEE International Conference on Cluster Computing (CLUSTER)
The advance of high-performance computing systems towards exascale will be constrained by the systems' energy consumption levels. Large numbers of processing components, memory, interconnects, and storage components must all be considered to achieve exascale performance within a targeted energy bound. While application-aware power allocation schemes for computing resources are well studied, a portable and scalable budget-constrained power management scheme for scientific applications on… 

Figures and Tables from this paper

I/O Aware Power Shifting
This paper explores algorithms that leverage application semantics -- phase frequency, duration and power needs -- to shift unused power from applications in I/O phases to applications in computation phases, thus improving system-wide performance.
Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing
A first-of-its-kind study on manufacturing variability on four production HPC systems spanning four microarchitectures is presented, its impact on HPC applications is analyzed, and a novel variation-aware power budgeting scheme is proposed to maximize effective application performance.
A Value-Oriented Job Scheduling Approach for Power-Constrained and Oversubscribed HPC Systems
A new algorithm is proposed that adapts its behavior to deliver the combined benefits of the two allocation strategies and results in improving HPC resource utilization while delivering a mean productivity that is almost the same as the best performing algorithm across various system-wide power constraints.
BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution
A Bundle-based PARallel Aggregation framework (BPAR) is proposed and three partitioning schemes under such framework that targets at improving the I/O performance of a mission-critical application GEOS-5, as well as a broad range of other scientific applications.
Efficient Storage Design and Query Scheduling for Improving Big Data Retrieval and Analytics
By leveraging the advanced features of cutting-edge non-volatile memories, a Phase Change Memory (PCM)-based hybrid storage architecture is presented and devised, which provides efficient buffer management and novel wear leveling techniques, thus achieving highly improved data retrieval performance and at the same time solving the PCM’s bottleneck issue.
Enhance parallel input/output with cross-bundle aggregation
The experiment result reveals that BPAR can deliver 2.1× I/O performance improvement over the baseline GEOS-5, and it is very promising in accelerating scientific applications’ I/o performance on various computing platforms.


Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications
The results show that, for programs that have a memory or communication bottleneck, a power-scalable cluster can save significant energy with only a small time penalty, and it is possible to both consume less energy and execute in less time by increasing the number of nodes while reducing the frequency-voltage setting of each node.
Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs
An MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product is presented.
Energy based performance tuning for large scale high performance computing systems
The unique power measurement capabilities of the Cray XT architecture are exploited to gain an understanding of the power requirements of important DOE/NNSA production scientific computing applications executing at large scale (thousands of nodes).
No "power" struggles: coordinated multi-level power management for the data center
This paper proposes and validate a power management solution that coordinates different individual approaches and performs a detailed quantitative sensitivity analysis to draw conclusions about the impact of different architectures, implementations, workloads, and system design choices.
Reducing Energy Usage with Memory and Computation-Aware Dynamic Frequency Scaling
This work presents an automated, fine-grained approach to selecting per-loop processor clock frequencies, and uses these tools to show a measured, system-wide energy savings of up to 7.6% on an 8-core Intel Xeon E5530 and 10.
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
MIND: A black-box energy consumption model for disk arrays
MIND is devised to quantitatively measure the power consumption of redundant disk arrays running different workloads in a variety of execution modes and can estimate power consumptions of disk arrays with an error rate less than 2%.
Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS
Formal models are presented for precedence-constrained parallel tasks, DVFS enabled clusters, and energy consumption and test results justify the design and implementation of proposed energy aware scheduling heuristics in the paper.
Cluster-level feedback power control for performance optimization
  • Xiaorui Wang, Ming Chen
  • Computer Science
    2008 IEEE 14th International Symposium on High Performance Computer Architecture
  • 2008
A cluster-level power controller that shifts power among servers based on their performance needs, while controlling the total power of the cluster to be lower than a constraint is proposed.
Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications
  • Dong Li, J. Vetter, Weikuan Yu
  • Computer Science
    2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • 2012
A binary instrumentation tool is developed to statistically report memory access patterns in stack, heap, and global data of mission-critical scientific applications and reveals that the performance of some applications is insensitive to relatively long NVRAM write-access latencies.