• Corpus ID: 4807062

NetChain: Scale-Free Sub-RTT Coordination

@inproceedings{Jin2018NetChainSS,
  title={NetChain: Scale-Free Sub-RTT Coordination},
  author={Xin Jin and Xiaozhou Li and Haoyu Zhang and Nate Foster and Jeongkeun Lee and Robert Soul{\'e} and Changhoon Kim and Ion Stoica},
  booktitle={NSDI},
  year={2018}
}
Coordination services are a fundamental building block of modern cloud systems, providing critical functionalities like configuration management and distributed locking. The major challenge is to achieve low latency and high throughput while providing strong consistency and fault-tolerance. Traditional server-based solutions require multiple round-trip times (RTTs) to process a query. This paper presents NetChain, a new approach that provides scale-free sub-RTT coordination in datacenters… 
NetLock: Fast, Centralized Lock Management Using Programmable Switches
TLDR
NetLock is a new centralized lock manager that co-designs servers and network switches to achieve high performance without sacrificing flexibility in policy support, and to exploit the capability of emerging programmable switches to directly process lock requests in the switch data plane.
Fault Tolerance for Service Function Chains
TLDR
FTC, novel system design and protocol for fault-tolerant service function chaining provides strong consistency with up to f middlebox failures for chains of length f+1 or longer without requiring dedicated replica nodes.
Fault Tolerant Service Function Chaining
TLDR
FTC, a system design and protocol for fault-tolerant service function chaining that provides strong consistency with up to f middlebox failures for chains of length f + 1 or longer without requiring dedicated replica nodes, is introduced.
R2P2: Making RPCs first-class datacenter citizens
TLDR
Evaluation of R2P2, a UDP-based transport protocol specifically designed for RPCs inside a datacenter, shows that the protocol is suitable for µs-scale RPCs and that its tail latency outperforms both random selection and classic HTTP reverse proxies.
Scaling Up The Performance of Distributed Key-Value Stores With In-Switch Coordination
  • Hebatalla Eldakiky, D. Du, Eman Ramadan
  • Computer Science
    2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
  • 2021
TLDR
TurboKV is proposed, an efficient distributed key-value store architecture that utilizes programmable switches as: 1) partition management nodes to store the key- value store partitions and replicas information; and 2) monitoring stations to measure and balance the load among storage nodes.
Datacenter RPCs can be General and Fast
TLDR
eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics.
HovercRaft: achieving scalability and fault-tolerance for microsecond-scale datacenter services
TLDR
HovercRaft is proposed, a new approach by which adding nodes increases both the resilience and the performance of general-purpose state-machine replication through an extension of the Raft protocol that carefully eliminates CPU and I/O bottlenecks and load balances requests.
Datacenter RPCs can be General and Fast Anuj
TLDR
eRPC is a new general-purpose remote procedure call (RPC) library that provides performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics.
P4xos: Consensus as a Network Service
TLDR
This paper explores how a programmable forwarding plane offered by a new breed of network switches might naturally accelerate consensus protocols, specifically focusing on Paxos, and significantly increases throughput and reduces latency by implementing Paxos in the forwarding plane.
RedPlane: enabling fault-tolerant stateful in-switch applications
TLDR
This paper design and implement RedPlane, a fault-tolerant state store for stateful in-switch applications that provides in- switch applications consistent access to their state, even if the switch they run on fails or traffic is rerouted to an alternative switch.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
NetChain: Scale-Free Sub-RTT Coordination (Extended Version)
TLDR
NetChain exploits recent advances in programmable switches to store data and process queries entirely in the network data plane, and design new protocols and algorithms based on chain replication to guarantee strong consistency and to efficiently handle switch failures.
Chaining for flexible and high-performance key-value systems
TLDR
A new replication protocol, Ouroboros, is presented, which extends chain-based replication to allow fast, non-blocking node additions to any part of the replica chain, and guarantees provably strong data consistency.
Consensus in a Box: Inexpensive Coordination in Hardware
TLDR
It is shown that consensus (atomic broadcast) can be removed from the critical path of performance by moving it to hardware by using an FPGA and combined with a mainmemory key value store running on specialized microservers results in a distributed service similar to Zookeeper that exhibits high and stable performance.
ZooKeeper: Wait-free Coordination for Internet-scale Systems
TLDR
ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state to enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers.
NetCache: Balancing Key-Value Stores with Fast In-Network Caching
TLDR
This work presents NetCache, a new key-value store architecture that leverages the power and flexibility of new-generation programmable switches to handle queries on hot items and balance the load across storage nodes, and shows that it improves the throughput by 3-10x and reduces the latency of up to 40% of queries by 50%, for high-performance, in-memory key- value stores.
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering
TLDR
A new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination, providing replication without the performance cost in the data center.
Highly Available Transactions: Virtues and Limitations
TLDR
A taxonomy of highly available systems is introduced and existing ACID isolation and distributed data consistency guarantees are analyzed to identify which can and cannot be achieved in HAT systems.
No compromises: distributed transactions with consistency, availability, and performance
TLDR
It is shown that a main memory distributed computing platform called FaRM can provide distributed transactions with strict serializability, high performance, durability, and high availability in modern data centers.
Designing Distributed Systems Using Approximate Synchrony in Data Center Networks
TLDR
This paper explores network-level mechanisms for providing Mostly-Ordered Multicast (MOM): a best-effort ordering property for concurrent multicast operations, and designs Speculative Paxos, a state machine replication protocol that relies on the network to order requests in the normal case.
Dynamo: amazon's highly available key-value store
TLDR
D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
...
1
2
3
4
5
...