Learn More
We succeeded in getting 14.9 TFLOPS performance when running a plasma simulation code IMPACT-3D parallelized with High Performance Fortran on 512 nodes of the Earth Simulator. The theoretical peak performance of the 512 nodes is 32 TFLOPS, which means 45% of the peak performance was obtained with HPF. IMPACT-3D is an implosion analysis code using TVD(More)
The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP and OpenACC directives in order to develop applications that can use accelerator clusters with ease. Moreover, in order(More)
This paper describes new fast integer sorting methods for single vector and shared-memory parallel vector computers, based on the bucket sort algorithm. Existing vectorization methods for bucket sort have made great efforts to avoid store conflicts of vector scatter operations, and therefore are not so efftcient. The vectorization methods shown in this(More)
We are developing HPF/SX V2, an HPF compiler for vector parallel machines. It provides some unique extensions as well as the features of HPF 2.0 and HPF/JA. This paper describes in particular four of them: 1) the ON directive of HPF 2.0, 2) the REFLECT and LOCAL directives of HPF/JA, 3) vectorization directives, and 4) automatic parallelization. We(More)
Partitioned Global Address Space (PGAS) programming languages have emerged as a means by which to program parallel computers, which are becoming larger and more complicated. For such languages, regular stencil codes are still one of the most important goals. We implemented three methods of stencil communication in a compiler for a PGAS language XcalableMP,(More)
Given that scientific computer programs are becoming larger and more complicated, high performance application developers routinely examine the program structure of their source code to improve their performance. We have developed K-scope, a source code analysis tool that can be used to improve code performance. K-scope has graphical user interface that(More)
In this paper, we present our XcalableMP implementation of the HPCC HPL, RandomAccess, FFT, and the Himeno benchmark [1] which is a typical stencil application. The highlights of this submission are as follows: • We implemented three HPCC benchmarks; HPL, RandomAccess, and FFT. In addition, we implemented the Himeno benchmark. • The SLOC (Source lines of(More)
Because current high-end parallel systems have more than several thousands of nodes, a new programming model is required to exploit different levels of parallelism. For example, a conventional master-worker program assumes that a worker is running as a single MPI process. In some cases, a worker may run as a set of MPI processes to exploit a different level(More)