Byzantine Fault Tolerance, from Theory to Reality

@inproceedings{Driscoll2003ByzantineFT,
  title={Byzantine Fault Tolerance, from Theory to Reality},
  author={Kevin R. Driscoll and Brendan Hall and H{\aa}kan Sivencrona and P. Zumsteg},
  booktitle={SAFECOMP},
  year={2003}
}
Since its introduction nearly 20 years ago, the Byzantine Generals Problem has been the subject of many papers having the scrutiny of the fault tolerance community. Numerous Byzantine tolerant algorithms and architectures have been proposed. However, this problem is not yet sufficiently understood by those who design, build, and maintain systems with high dependability requirements. Today, there are still many misconceptions relating to Byzantine failure, what makes a system vulnerable, and… Expand
Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters
TLDR
The operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework, is described. Expand
Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters
TLDR
The operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework, is described. Expand
Real-Time Replica Consistency over Ethernet with Reliability Bounds
TLDR
This work presents a hard real-time interactive consistency protocol that allows distributed processes to agree on a common state despite Byzantine errors and presents the first quantitative, real- time-aware reliability analysis of such a protocol deployed over switched Ethernet in the presence of stochastic transient faults. Expand
Resource-efficient fault and intrusion tolerance
More and more network-based services are considered essential by their operators: either because their unavailability might directly lead to economic losses, as with e-commerce applications or onlineExpand
IGOR: Accelerating Byzantine Fault Tolerance for Real-Time Systems with Eager Execution
TLDR
IGOR is a new speculative BFTSMR approach that leverages multi-core processors to avoid the added latency inherent to traditional BFT SMR techniques in both the absence and presence of faults and noticeably increases vehicle stability. Expand
The Startup Problem in Fault-Tolerant Time-Triggered Communication
  • W. Steiner, H. Kopetz
  • Computer Science
  • International Conference on Dependable Systems and Networks (DSN'06)
  • 2006
TLDR
A general startup strategy for safety-critical systems is presented and a new startup algorithm that is used in a TTP/C research derivative protocol (LTTP) is derived and analyzed. Expand
A Byzantine-Fault Tolerant Self-stabilizing Protocol for Distributed Clock Synchronization Systems
TLDR
This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems, and model checking results confirm the theoretical predictions. Expand
A Byzantine Fault-Tolerant Key-Value Store for Safety-Critical Distributed Real-Time Systems
From modern cars to airplanes to industrial plants, many applications that must execute in a timely manner are deployed on distributed systems. In case of safety-critical applications, like theExpand
Using offline and online BIST to improve system dependability - the TTPC-C example
  • A. Steininger, Johann Vilanek
  • Engineering, Computer Science
  • Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors
  • 2002
TLDR
It turns out that the application of BIST during node startup and before node reintegration improves system fault tolerance and a combined strategy of online BIST and error correction can efficiently protect memory. Expand
SIGACT News Complexity Theory Column 67
TLDR
This paper starts from commonly acknowledged issues that impede the adoption of Byzantine fault tolerance within a single cloud, and argues that many of these issues fade when Byzantine faultolerance in the Intercloud is considered. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
The Byzantine Generals Problem
TLDR
It is shown that, using only oral messages, the problem of a group of generals camped with their troops around an enemy city is solvable if and only if more than two-thirds of the generals are loyal; so a single traitor can confound two loyal generals. Expand
FTMP—A highly reliable fault-tolerant multiprocess for aircraft
TLDR
The core software in the FFMP will handle all fault detection, diagnosis, and recovery in such a way that applications programs do not need to be involved. Expand
Heavy-Ion Fault Injections in the Time-Triggered Communication Protocol
TLDR
The experimental results show that arbitrary faults in one node can cause inconsistencies in the cluster and jeopardize the operation of correctly working nodes and the whole cluster, and it seems to be important to further analyze if and why cluster sizes need to be taken into account when validating distributed systems. Expand
Real-Time Systems - Design Principles for Distributed Embedded Applications
  • H. Kopetz
  • Computer Science, Engineering
  • Real-Time Systems Series
  • 1997
TLDR
Real-Time Systems offers a splendid example for the balanced, integrated treatment of systems and software engineering, helping readers tackle the hardest problems of advanced real-time system design, such as determinism, compositionality, timing and fault management. Expand
SIFT: Design and analysis of a fault-tolerant computer for aircraft control
TLDR
SIFT (Software Implemented Fault Tolerance) is an ultrareliable computer for critical aircraft control applications that achieves fault tolerance by the replication of tasks among processing units by using a novel fault-tolerant synchronization method. Expand
Impact of deep submicron technology on dependability of VLSI circuits
  • C. Constantinescu
  • Computer Science
  • Proceedings International Conference on Dependable Systems and Networks
  • 2002
TLDR
It is concluded that the semiconductor industry is approaching a new stage in the design and manufacturing of VLSI circuits, and Fault-tolerance features, specific to custom designed computers, have to be integrated into commercial-off-the-shelf (COTS) V LSI systems in the future, in order to preserve data integrity and limit the impact of transient and intermittent faults. Expand
Beyond the byzantine generals: unexpected behavior and bridging fault diagnosis
TLDR
A diagnosis procedure that uses modified composite signatures constructed from single stuck-at information combined with a lexicographic matching and ranking algorithm to perform high-quality bridging fault diagnosis for diagnostic experiments involving dropping or adding behaviors from the simulations of faulty circuits. Expand
A conceptual design for a Reliable Optical Bus (ROBUS)
TLDR
The SPIDER is a general-purpose computational platform suitable for use in ultrareliable embedded control applications and the conceptual design of the ROBUS is presented in this paper including requirements, topology, protocols, and the block-level design. Expand
Slightly-off-specification failures in the time-triggered architecture
  • A. Ademaj
  • Computer Science
  • Seventh IEEE International High-Level Design Validation and Test Workshop, 2002.
  • 2002
TLDR
This work presents the observed temporal SOS failures in the time-triggered architecture with the bus interconnection structure during the execution of the software implemented fault injection in the TTP/C communication controller. Expand
Formal verification for time-triggered clock synchronization
TLDR
The paper reports on the formal analysis of the clock synchronization service provided as an integral feature by the Time-Triggered Protocol, a communication protocol particularly suitable for safety-critical control applications, such as in automotive "by-wire" systems. Expand
...
1
2
...