Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection

@article{Zhu2019HarmoniaNS,
  title={Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection},
  author={Hang Zhu and Zhihao Bai and Jialin Li and Ellis Michael and Dan R. K. Ports and Ion Stoica and Xin Jin},
  journal={Proc. VLDB Endow.},
  year={2019},
  volume={13},
  pages={376-389}
}
Distributed storage employs replication to mask failures and improve availability. However, these systems typically exhibit a hard tradeoff between consistency and performance. Ensuring consistency introduces coordination overhead, and as a result the system throughput does not scale with the number of replicas. We present Harmonia, a replicated storage architecture that exploits the capability of new-generation programmable switches to obviate this tradeoff by providing near-linear scalability… 
In-network leaderless replication for distributed data stores
TLDR
This work presents NetLR, a replicated data store architecture that supports high performance, fault tolerance, and linearizability simultaneously, and moves the entire replication functions into the network by leveraging the switch as an on-path in-network replication orchestrator.
Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol
TLDR
This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas.
Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories
TLDR
Pegasus is a new storage system that leverages new-generation programmable switch ASICs to balance load across storage servers and improves the throughput of a distributed in-memory key-value store by more than 10x under a latency SLO.
Concordia: Distributed Shared Memory with In-Network Cache Coherence
TLDR
This paper proposes CONCORDIA, a DSM with fast in-network cache coherence backed by programmable switches, and introduces two techniques: an ownership migration mechanism to address the problem of limited memory capacity on switches and idempotent operations to handle packet loss in the case that switches are stateful.
HovercRaft: achieving scalability and fault-tolerance for microsecond-scale datacenter services
TLDR
HovercRaft is proposed, a new approach by which adding nodes increases both the resilience and the performance of general-purpose state-machine replication through an extension of the Raft protocol that carefully eliminates CPU and I/O bottlenecks and load balances requests.
Rolis: a software approach to efficiently replicating multi-core transactions
TLDR
Rolis aims to mask the high cost of replication by ensuring that cores are always doing useful work and not waiting for each other or for other replicas, by not mixing the multi-core concurrency control with multi-machine replication, as is traditionally done by systems that use Paxos to replicate the transaction commit protocol.
NetLock: Fast, Centralized Lock Management Using Programmable Switches
TLDR
NetLock is a new centralized lock manager that co-designs servers and network switches to achieve high performance without sacrificing flexibility in policy support, and to exploit the capability of emerging programmable switches to directly process lock requests in the switch data plane.
Odyssey: the impact of modern hardware on strongly-consistent replication protocols
TLDR
Odyssey is presented, a framework tailored towards protocol implementation for multi-threaded, RDMA-enabled, in-memory, replicated KVSes, and the first apples-to-apples comparison of replication protocols over modern hardware is performed.
Scaling Replicated State Machines with Compartmentalization [Technical Report]
TLDR
The first comprehensive technique to eliminate state machine replication bottlenecks is introduced, and it is demonstrated how to compartmentalize MultiPaxos to increase its throughput by 6× on a write-only workload and 16x on a mixed read-write workload.
Scaling Replicated State Machines with Compartmentalization
TLDR
The first comprehensive technique to eliminate state machine replication bottlenecks is introduced, and it is demonstrated how to compartmentalize MultiPaxos to increase its throughput by 6× on a write-only workload and 16x on a mixed read-write workload.
...
...

References

SHOWING 1-10 OF 70 REFERENCES
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering
TLDR
A new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination, providing replication without the performance cost in the data center.
Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control
TLDR
Eris can process a large class of distributed transactions in a single round-trip from the client to the storage system without any explicit coordination between shards or replicas in the normal case, providing atomicity, consistency, and fault tolerance with less than 10% overhead.
Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads
TLDR
Additional design and implementation considerations for geo-replicated CRAQ storage across multiple datacenters to provide locality-optimized operations are explored and multi-object atomic updates and multicast optimizations for large-object updates are discussed.
Update propagation protocols for replicated databates
TLDR
Two new lazy update protocols are proposed that guarantee serializability but impose a much weaker requirement on data placement than earlier protocols, which outperform existing protocols over a wide range of workloads.
Designing Distributed Systems Using Approximate Synchrony in Data Center Networks
TLDR
This paper explores network-level mechanisms for providing Mostly-Ordered Multicast (MOM): a best-effort ordering property for concurrent multicast operations, and designs Speculative Paxos, a state machine replication protocol that relies on the network to order requests in the normal case.
ZooKeeper: Wait-free Coordination for Internet-scale Systems
TLDR
ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state to enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers.
Ganymed: Scalable Replication for Transactional Web Applications
TLDR
Ganymed is introduced, a database replication middleware intended to provide scalability without sacrificing consistency and avoiding the limitations of existing approaches by using a novel transaction scheduling algorithm that separates update and read-only transactions.
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
TLDR
This paper presents a new replication algorithm that has desirable performance properties, based on the primary copy technique, and uses a special kind of timestamp called a viewstamp to detect lost information.
Chain Replication for Supporting High Throughput and Availability
TLDR
Besides outlining the chain replication protocols themselves, simulation experiments explore the performance characteristics of a prototype implementation and several object-placement strategies (including schemes based on distributed hash table routing) are discussed.
NetChain: Scale-Free Sub-RTT Coordination
TLDR
NetChain exploits recent advances in programmable switches to store data and process queries entirely in the network data plane, and design new protocols and algorithms based on chain replication to guarantee strong consistency and to efficiently handle switch failures.
...
...