Learn More
This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into <i>checkpoint-based</i> and <i>log-based.</i> <i>Checkpoint-based</i> protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated,(More)
This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs. The model captures the effects of two important masking(More)
A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine Fault-Tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (&#8220;Byzantine&#8221;)(More)
We present the first protocol that reaches asynchronous Byzantine consensus in two communication steps in the common case. We prove that our protocol is optimal in terms of both number of communication steps and number of processes for two-step consensus. The protocol can be used to build a replicated state machine that requires only three communication(More)
We describe a new architecture for Byzantine fault tolerant state machine replication that separates <i>agreement</i> that orders requests from <i>execution</i> that processes requests. This separation yields two fundamental and practically significant advantages over previous architectures. First, it reduces replication costs because the new architecture(More)
This paper argues for a new approach to building Byzantine fault tolerant systems. We observe that although recently developed BFT state machine replica-tion protocols are quite fast, they don't actually tolerate Byzantine faults very well: a single faulty client or server is capable of rendering PBFT, Q/U, HQ, and Zyzzyva virtually unusable. In this paper,(More)
Communication induced checkpointing (CIC) allows processes in a distributed computation to take independent checkpoints and to avoid the domino effect. This paper presents an analysis of CIC protocols based on a prototype implementation and validated simulations. Our result indicate that there is sufficient evidence to suspect that much of the conventional(More)
This paper presents Eve, a new Execute-Verify architecture that allows state machine replication to scale to multi-core servers. Eve departs from the traditional agree-execute architecture of state machine replication: replicas first execute groups of requests concurrently and then verify that they can reach agreement on a state and output produced by a(More)