Zab: High-performance broadcast for primary-backup systems

  title={Zab: High-performance broadcast for primary-backup systems},
  author={Flavio Paiva Junqueira and Benjamin C. Reed and Marco Serafini},
  journal={2011 IEEE/IFIP 41st International Conference on Dependable Systems \& Networks (DSN)},
  • F. JunqueiraB. ReedM. Serafini
  • Published 27 June 2011
  • Political Science
  • 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN)
Zab is a crash-recovery atomic broadcast algorithm we designed for the ZooKeeper coordination service. ZooKeeper implements a primary-backup scheme in which a primary process executes clients operations and uses Zab to propagate the corresponding incremental state changes to backup processes1. Due the dependence of an incremental state change on the sequence of changes previously generated, Zab must guarantee that if it delivers a given state change, then all other changes it depends upon must… 

Figures and Tables from this paper

Improving the Latency and Throughput of ZooKeeper Atomic Broadcast

Two easy-to-implement Zab variants are presented, called ZabAC and ZabAA, designed to offer small atomic-broadcast latencies and to reduce the processing load on the primary node that plays a leading role in Zab.

Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes

Three variations of Zab are proposed and the potential of coin-tossing in ZooKeeper performances better than Zab is found, particularly at high workloads, because of the least-restricted Zab fault assumptions.

Improving ZooKeeper Atomic Broadcast Performance by Coin Tossing

The coin-tossing Zab version (ZabCT) meets all requirements essential for crash-tolerance provisions within Zab which can be adopted in any ZabCT implementation and the dual objectives of performance gains and traffic reduction can be accomplished.

Mechanisms for improving ZooKeeper Atomic Broadcast performance

Two main limitations that prevent existing systems such as Apache ZooKeeper from achieving a higher write performance are identified and three variations of Zab are proposed, which are all capable of reaching an agreement in fewer communication steps than Zab.

Dynamic Reconfiguration of Primary/Backup Clusters

A new reconfiguration protocol is described, recently implemented in Apache Zookeeper, that fully automates configuration changes and minimizes any interruption in service to clients while maintaining data consistency.

Brief Announcement: Consensus and Efficient Passive Replication

Using the Paxos consensus protocol to implement passive replication requires taking care of peculiar constraints, and Paxos does not necessarily preserve the dependency between A and the delivery of δAB.

Make the Leader Work: Executive Deferred Update Replication

EDUR streamlines transaction certification with the broadcast protocol, which improves overall performance and scalability compared to deferred update replication based on total order broadcast (TOB).

Rollup : Non-Disruptive Rolling Upgrade

Although Rollup builds upon existing lower-bound results in terms of load and time, its key contribution is to bridge the gap between a long body of theoretical results and recent system achievements through the rolling upgrade application.

Scalable coordination of distributed in-memory transactions

It is experimentally demonstrated that transaction latency and throughput scale considerably well when an atomic multicast service is offered to transaction nodes by a crash-tolerant ensemble of dedicated nodes and that using such a service is the most scalable approach compared to practices advocated in the literature.

Acuerdo: Fast Atomic Broadcast over RDMA

Acuerdo is built from the ground up to perform communication using one-side RDMA writes, which do not use the CPU of the remote machine, and is explicitly designed to minimize waiting on the critical path.



Vertical paxos and primary-backup replication

It is shown how primary-backup systems in current use can be viewed, and shown to be correct, as instances of Vertical Paxos algorithms, in which reconfiguration can occur in the middle of reaching agreement on an individual state-machine command.

Reliable and total order broadcast in the crash-recovery model

Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems

This paper presents a new replication algorithm that has desirable performance properties, based on the primary copy technique, and uses a special kind of timestamp called a viewstamp to detect lost information.

ZooKeeper: Wait-free Coordination for Internet-scale Systems

ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state to enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers.

Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems and Its Use in Quorum-Based Replication

It is shown that atomic broadcast can be implemented requiring few additional log operations in excess of those required by the consensus, and howAdditional log operations can improve the protocol in terms of faster recovery and better throughput.

A new look at atomic broadcast in the asynchronous crash-recovery model

The paper proposes a new specification of atomic broadcast in the crash-recovery model that allows to distinguish between a uniform and a non-uniform version of Atomic broadcast, and is thus more efficient.

Chain Replication for Supporting High Throughput and Availability

Besides outlining the chain replication protocols themselves, simulation experiments explore the performance characteristics of a prototype implementation and several object-placement strategies (including schemes based on distributed hash table routing) are discussed.

The Chubby lock service for loosely-coupled distributed systems

The paper describes the initial design and expected use, compares it with actual use, and explains how the design had to be modified to accommodate the differences.

Efficient message ordering in dynamic networks

The aJgorithm always allows processors to initiate messages, even when they are not members of a connected majority component in the network, so that messages can eventually become totally ordered even if their initiator is never a member of a majority component.

Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links

This paper provides a realization of distributed leader election without having any eventual timely links, and an extension of the protocol provides leader stability, which guarantees against arbitrary demotion of a qualified leader and avoids performance penalties associated with leader changes in schemes such as Paxos.