• Corpus ID: 248965254

Prediction for Distributional Outcomes in High-Performance Computing I/O Variability

@inproceedings{Xu2022PredictionFD,
  title={Prediction for Distributional Outcomes in High-Performance Computing I/O Variability},
  author={Li Xu and Yili Hong and Max D. Morris and Kirk W. Cameron},
  year={2022}
}
Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability re-mains a major challenge and has become a critical research topic in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performance variability management and is nontrivial because one needs to predict a distribution function… 

References

SHOWING 1-10 OF 48 REFERENCES
Prediction of high-performance computing input/output variability and its application to optimization for system configurations
TLDR
The findings from method comparisons and developed tool sets in this paper yield new insights into existing statistical methods and can be beneficial for the practice of HPC variability management.
Modeling I/O performance variability in high-performance computing systems using mixture distributions
Design strategies and approximation methods for high-performance computing variability management
TLDR
Although the GBD can also outperform SFDs for smooth underlying surface, GBD is not scalable to high-dimensional experimental regions, therefore, SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model.
MOANA: Modeling and Analyzing I/O Variability in Parallel System Experimental Design
TLDR
The use of MOANA is demonstrated to accurately predict the confidence intervals of unmeasured I/O system configurations for a given number of repeat runs – enabling users to quantitatively balance experiment duration with statistical confidence.
Achieving Performance Isolation with Lightweight Co-Kernels
TLDR
Pisces is presented, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads.
Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates
TLDR
A class of parallel algorithms for explicit stencil computations that can tolerate nonuniformities by decoupling per process communication and computation in order for each process to progress asynchronously while maintaining solution correctness are proposed.
Computer Model Emulation with High-Dimensional Functional Output in Large-Scale Observing System Uncertainty Experiments
TLDR
A statistical emulator to facilitate large-scale OSUEs in the OCO-2 mission that outperforms other competing statistical methods and a reduced order model that approximates the full-physics forward model.
Computer Model Calibration Using High-Dimensional Output
TLDR
This work focuses on combining observations from field experiments with detailed computer simulations of a physical process to carry out statistical inference, and makes use of basis representations to reduce the dimensionality of the problem and speed up the computations required for exploring the posterior distribution.
Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters
TLDR
This paper proposes workload deployment methods that reduce the likelihood of timing errors in shared memory clusters of processor cores that are incorporated in a runtime layer in the OpenMP framework that enables parsimonious countermeasures against timing errors induced by hardware variability.
Stepping towards noiseless Linux environment
TLDR
This work describes the investigation of isolation of application processes from the operating system using a soft-partitioning scheme, and introduces the invasive method, where the involuntary preemption induced by task scheduling is removed.
...
...