Cuttlefish: library for achieving energy efficiency in multicore parallel programs

@article{Kumar2021CuttlefishLF,
  title={Cuttlefish: library for achieving energy efficiency in multicore parallel programs},
  author={Sunil Kumar and Akshat Gupta and Vivek Kumar and Sridutt Bhalachandra},
  journal={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  year={2021}
}
A low-cap power budget is challenging for exascale computing. Dynamic Voltage and Frequency Scaling (DVFS) and Uncore Frequency Scaling (UFS) are the two widely used techniques for limiting the HPC application's energy footprint. However, existing approaches fail to provide a unified solution that can work with different types of parallel programming models and applications. This paper proposes Cuttlefish, a programming model oblivious C/C++ library for achieving energy efficiency in multicore… 
1 Citations
The LBNL Superfacility Project Report
TLDR
By the close of the project, the Superfacility project met its project goal by enabling science application engagements to demonstrate automated pipelines that analyze data from remote facilities at large scale, without routine human intervention.

References

SHOWING 1-10 OF 53 REFERENCES
Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control
TLDR
An experimental memory study presented on modern CPU architectures, Intel Sandybridge and Haswell, identifies a metric, TORo_core, that detects bandwidth saturation and increased latency that is used to construct a dynamic policy applied at coarse and fine-grained levels to modulate per-core power controls on Haswell machines.
Automatic runtime frequency-scaling system for energy savings in parallel applications
TLDR
A novel runtime system that maximizes energy saving by selecting appropriate values for DVFS and throttling in parallel applications and applies frequency scaling considering both the CPU offload, provided by the network-interface card, and the architectural stalls during computation is proposed.
Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications
TLDR
This work takes advantage of the low transition overhead of CPU clock modulation and applies it to fine-grained Open MP parallel loops to achieve multi-frequency execution of Open MP applications that achieves better energy-delay trade-off than any single frequency setting.
Hybrid MPI/OpenMP power-aware computing
TLDR
A new power-aware performance prediction model of hybrid MPI/OpenMP applications is used to derive a novel algorithm for power-efficient execution of realis tic applications from ASCS equoia and N PB MZ bench marks.
Adagio: making DVS practical for complex HPC applications
TLDR
Adagio is presented, a novel runtime system that makes DVS practical for complex, real-world scientific applications by incurring only negligible delay while achieving significant energy savings.
A case for application-oblivious energy-efficient MPI runtime
TLDR
Energy Aware MPI (EAM) is proposed and implemented --- an application-oblivious energy-efficient MPI runtime that uses a combination of communication models for common MPI primitives and an online observation of slack to maximize energy efficiency and to honor performance degradation limits.
Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster
TLDR
A new algorithm is proposed that reduces energy consumption in a parallel program executed on a power-scalable cluster using DVFS, which reclaims slack time by changing the voltage and frequency, which allows a reduction in energy consumption without impacting on the performance of the program.
Using multiple energy gears in MPI programs on a power-scalable cluster
TLDR
This paper presents a framework for executing a single application in several frequency-voltage settings and shows that more than half of the NAS benchmarks exhibit a better energy-time tradeoff using multiple gears than using a single gear.
An Adaptive Core-Specific Runtime for Energy Efficiency
TLDR
An Adaptive Core-specific Runtime (ACR) that dynamically adapts core frequencies to workload characteristics, and examples of both reductions in power and improvement in the average performance are presented.
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
TLDR
This paper presents a system called Jitter, which reduces the frequency on nodes that are assigned less computation and therefore have slack time, and the goal of Jitter is to attempt to ensure that they arrive "just in time" so that they avoid increasing overall execution time.
...
...