A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers

@article{Sato2014AUI,
  title={A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers},
  author={Kento Sato and Kathryn Mohror and Adam Moody and Todd Gamblin and Bronis R. de Supinski and Naoya Maruyama and Satoshi Matsuoka},
  journal={2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing},
  year={2014},
  pages={21-30}
}
Checkpoint/Restart is an indispensable fault tolerance technique commonly used by high-performance computing applications that run continuously for hours or days at a time. However, even with state-of-the-art checkpoint/restart techniques, high failure rates at large scale will limit application efficiency. To alleviate the problem, we consider using burst buffers. Burst buffers are dedicated storage resources positioned between the compute nodes and the parallel file system, and this new tier… CONTINUE READING
Highly Cited
This paper has 32 citations. REVIEW CITATIONS
22 Extracted Citations
28 Extracted References
Similar Papers

Citing Papers

Publications influenced by this paper.
Showing 1-10 of 22 extracted citations

Referenced Papers

Publications referenced by this paper.
Showing 1-10 of 28 references

Similar Papers

Loading similar papers…