Handling Cascading Failures: The Case for Topology-Aware Fault-Tolerance

Abstract

Large distributed systems contain multiple components that can interact in sometimes unforeseen and complicated ways; this emergent “vulnerability of complexity” increases the likelihood of cascading failures that might result in widespread disruption. Our research explores whether we can exploit the knowledge of the system’s topology, the application’s… (More)

Topics

5 Figures and Tables

Slides referencing similar topics