Rabia: Simplifying State-Machine Replication Through Randomization

  title={Rabia: Simplifying State-Machine Replication Through Randomization},
  author={Haochen Pan and Jesse Tuglu and Neo Zhou and Tianshu Wang and Yicheng Shen and Xiong Zheng and Joseph Tassarotti and Lewis Tseng and Roberto Palmieri},
  journal={Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles},
  • Haochen Pan, Jesse Tuglu, +6 authors Roberto Palmieri
  • Published 26 September 2021
  • Computer Science
  • Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
We introduce Rabia, a simple and high performance framework for implementing state-machine replication (SMR) within a datacenter. The main innovation of Rabia is in using randomization to simplify the design. Rabia provides the following two features: (i) It does not need any fail-over protocol and supports trivial auxiliary protocols like log compaction, snapshotting, and reconfiguration, components that are often considered the most challenging when developing SMR systems; and (ii) It… Expand

Figures and Tables from this paper


Leaderless State-Machine Replication: Specification, Properties, Limits
This paper proposes a framework that captures the essence of leaderless state-machine replication (Leaderless SMR), and introduces a set of desirable properties for these protocols: (R)eliability, (O)ptimal (L)atency and(L)oad Balancing, and establishes a lower bound on the message delay to execute a command in protocols optimal for the ROLL properties. Expand
Scaling Replicated State Machines with Compartmentalization
The first comprehensive technique to eliminate state machine replication bottlenecks is introduced, and it is demonstrated how to compartmentalize MultiPaxos to increase its throughput by 6× on a write-only workload and 16x on a mixed read-write workload. Expand
State-machine replication for planet-scale systems
Atat, the first state-machine replication protocol tailored for planet-scale systems, is presented, which is up to two times faster than Flexible Paxos with identical failure assumptions, and more than doubles the performance of Egalitarian Paxos in the YCSB benchmark. Expand
Mencius: Building Efficient Replicated State Machine for WANs
This work presents a protocol for general state machine replication - a method that provides strong consistency - that has high performance in a wide-area network and low latency under low client load even under changing wide- area network environment and client load. Expand
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering
A new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination, providing replication without the performance cost in the data center. Expand
There is more consensus in Egalitarian parliaments
Egalitarian Paxos is to the authors' knowledge the first protocol to achieve the previously stated goals efficiently, requiring only a simple majority of replicas to be non-faulty, using a number of messages linear in the number of replica to choose a command, and committing commands after just one communication round. Expand
Non-determinism in Byzantine Fault-Tolerant Replication
This paper distinguishes three models for dealing with non-determinism in replicated services, where some processes are subject to faults and arbitrary behavior (so-called Byzantine faults), and introduces two new protocols that use the modular approach for filtering out non-de\-ter\-min\-istic operations in an application. Expand
Microsecond Consensus for Microsecond Applications
Mu is proposed, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail- over latencies of the prior systems by at least 61% and 90%. Expand
In Search of an Understandable Consensus Algorithm
Raft is a consensus algorithm for managing a replicated log that separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Expand
Vertical paxos and primary-backup replication
It is shown how primary-backup systems in current use can be viewed, and shown to be correct, as instances of Vertical Paxos algorithms, in which reconfiguration can occur in the middle of reaching agreement on an individual state-machine command. Expand