Learn More
Graphics Processing Units (GPUs) offer high computational power but require high scheduling strain to manage parallel processes, which increases the GPU cross section. The results of extensive neutron radiation experiments performed on NVIDIA GPUs confirm this hypothesis. Reducing the application Degree Of Parallelism (DOP) reduces the scheduling strain but(More)
In this paper we assess and discuss the radiation sensitivity of a set of HPC applications executed on NVIDIA K20 GPGPUs. The occurrence of both radiation-induced silent data corruption and functional interruption will be experimentally addressed for Hotspot, LavaMD, and Matrix Transponse. Each of the tested codes requires a proper computational power and(More)
Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the memory access costs within a compute node are also asymmetric. Ignoring this can lead to an increase in the data movement costs. Therefore, to(More)
We have developed a model called MigBSP that controls processes rescheduling in BSP (Bulk Synchronous Parallel)applications. A BSP application is composed by one or more supersteps, each one containing both computation and communication phases followed by a synchronization barrier. Since the barrier waits for the slowest process, MigBSP’s final idea(More)
This work shows how a Hybrid MPI/OpenMP implementation can improve the performance of the Ocean-Land-Atmosphere Model (OLAM) on a multi-core cluster environment, which is a typical HPC many small files workload application. Previous experiments have shown that the scalability of this application on clusters is limited by the performance of the output(More)
The High Performance Computing (HPC) community aimed for many years to increase performance regardless of energy consumption. Until the end of the decade, a next generation of HPC systems is expected to reach sustained performances of the order of exaflops. This requires many times more performance compared to the fastest supercomputers of today. Achieving(More)
—Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high performance computing. On such NUMA nodes, the shared memory is physically distributed into memory banks connected by a network. Owing to this, memory access costs may vary depending on the distance between the processing unit and the memory bank. Therefore, a key(More)
The communication between tasks of a parallel application is an important characteristic to consider when mapping tasks to computing cores due to possible differences in communication performance. Within a machine, performance differences are introduced by the memory hierarchy, in which cache memories can be shared by groups of cores and intra-chip(More)