Benjamin Mandler

Learn More
High performance computing systems display increasing complexity and component counts. This trend exposes weaknesses in the underlying clustering infrastructure needed for continuous availability, maximizing utilization, and efficient administration of such systems. To mitigate the problem, we present a highly scalable clustering infrastructure, based on(More)
As HPC systems and applications get bigger and more complex, we are approaching an era in which resiliency and run-time elasticity concerns become paramount. We offer a building block for an alternative resiliency approach in which computations will be able to make progress while components fail, in addition to enabling a dynamic set of nodes throughout a(More)
  • 1