Hongjia Cao

Learn More
Fault resilience has became a major issue for HPC systems, in particular in the perspective of future E-scale systems, which will consist of millions of CPU cores and other components. Fault tolerant MPI was proposed to offer support of software level fault tolerance approaches. However, the widely used MPI implementations, such as MPICH and Mvapich2,(More)
  • 1