Learn More
This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs. The model captures the effects of two important masking(More)
This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into <i>checkpoint-based</i> and <i>log-based.</i> <i>Checkpoint-based</i> protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated,(More)
A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine Fault-Tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (&#8220;Byzantine&#8221;)(More)
— We present the first protocol that reaches asynchronous Byzantine consensus in two communication steps in the common case. We prove that our protocol is optimal in terms of both number of communication steps, and number of processes for two-step consensus. The protocol can be used to build a replicated state machine that requires only three communication(More)
This paper argues for a new approach to building Byzantine fault tolerant systems. We observe that although recently developed BFT state machine replica-tion protocols are quite fast, they don't actually tolerate Byzantine faults very well: a single faulty client or server is capable of rendering PBFT, Q/U, HQ, and Zyzzyva virtually unusable. In this paper,(More)
The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices in UpRight favor simplifying adoption by existing(More)
We describe a new architecture for Byzantine fault tolerant state machine replication that separates <i>agreement</i> that orders requests from <i>execution</i> that processes requests. This separation yields two fundamental and practically significant advantages over previous architectures. First, it reduces replication costs because the new architecture(More)
— We present an implementation of a fault-tolerant TCP (FT-TCP) that allows a faulty server to keep its TCP connections open until it either recovers or it is failed over to a backup. The failure and recovery of the server process are completely transparent to client processes connected with it via TCP. FT-TCP does not affect the software running on a(More)
We present FlightPath, a novel peer-to-peer streaming application that provides a highly reliable data stream to a dynamic set of peers. We demonstrate that FlightPath reduces jitter compared to previous works by several orders of magnitude. Furthermore , FlightPath uses a number of run-time adaptations to maintain low jitter despite 10% of the population(More)