• Corpus ID: 239998532

Performance prediction of massively parallel computation by Bayesian inference

@article{Kohashi2021PerformancePO,
  title={Performance prediction of massively parallel computation by Bayesian inference},
  author={Hisashi Kohashi and Harumichi Iwamoto and Takeshi Fukaya and Yusaku Yamamoto and Takeo Hoshi},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.14545}
}
A performance prediction method for massively parallel computation is proposed. The method is based on performance modeling and Bayesian inference to predict elapsed time T as a function of the number of used nodes P (T = T (P )). The focus is on extrapolation for larger values of P from the perspective of application researchers. The proposed method has several improvements over the method developed in a previous paper, and application to realsymmetric generalized eigenvalue problem shows… 

Figures from this paper

References

SHOWING 1-10 OF 16 REFERENCES
Performance Modeling for Dense Linear Algebra
  • E. Peise, P. Bientinesi
  • Computer Science
    2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • 2012
TLDR
This article develops a framework for the automatic generation of statistical performance models for BLAS and LAPACK libraries and demonstrates that this approach is successful in both single- and multi-core environments, not only in the ranking of algorithms but also in tuning their parameters.
EigenKernel - A middleware for parallel generalized eigenvalue solvers to attain high scalability and usability
TLDR
The benchmark was carried out on the Oakforest-PACS supercomputer and reveals that ELPA, EigenExa and their hybrid solvers show better performance, when compared with pure ScaLAPACK solvers.
A Hierarchical Approach for Performance Analysis of ScaLAPACK-Based Routines Using the Distributed Linear Algebra Machine
TLDR
An hierarchical approach for design of performance models for parallel algorithms in linear algebra based on a parallel machine model and the hierarchical structure of the ScaLAPACK library is presented.
A Case Study on Modeling the Performance of Dense Matrix Computation: Tridiagonalization in the EigenExa Eigensolver on the K Computer
TLDR
A case study in which the performance of the tridiagonalization routine in the EigenExa eigensolver on the K computer is modeled, and several situations in which different amounts of limited information are available for performance modeling are assumed.
Performance Analysis of MPI Collective Operations
TLDR
This paper analyzes and attempts to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP.
Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues
TLDR
This paper provides analytical models for scalability of collective communication algorithms, such as broadcast, allreduce, and all-to-all, and applies these models to an IBM Blue Gene/P system and compares the analytical performance estimates with experimentally measured values.
Following the Blind Seer - Creating Better Performance Models Using Less Information
TLDR
A new model-generation algorithm is proposed that makes Extra-P easier to use, and a scale-independent error metric tells both when to stop the refinement process and whether a model reflects faithfully enough the behavior the data exhibits, which makes it to produce more accurate results.
ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications
TLDR
The results of the two projects Catwalk, which aimed to create tools that automate key activities of the performance modeling process, and ExtraPeak, which built upon the results of Catwalk and worked toward making this powerful methodology more flexible, streamlined and easy to use are summarized.
Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection
TLDR
This work presents a realistic performance model and indicates the possibility that TSQR becomes slower than Householder QR as the number of columns of the target matrix increases, and aims for estimating the difference and selecting the faster algorithm by using models, which falls into auto-tuning.
An order-N electronic structure theory with generalized eigenvalue equations and its application to a ten-million-atom system.
TLDR
A linear algebraic theory called the 'multiple Arnoldi method' is presented and realizes large-scale (order-N) electronic structure calculations with generalized eigenvalue equations with tight-binding-form Hamiltonians.
...
1
2
...