Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.

@article{Lamport1984UsingTI,
  title={Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.},
  author={Leslie Lamport},
  journal={ACM Trans. Program. Lang. Syst.},
  year={1984},
  volume={6},
  pages={254-280}
}
  • L. Lamport
  • Published 1 April 1984
  • Computer Science
  • ACM Trans. Program. Lang. Syst.
Description d'une methode generale pour implementer un systeme reparti ayant n'importe quel degre desire de tolerance de panne. La synchronisation par horloge fiable et une solution au probleme «Bizantine Generals» sont assumes 

Figures from this paper

TTP - A Protocol for Fault-Tolerant Real-Time Systems

TLDR
The authors describe the architectural assumptions, fault hypothesis, and objectives for the TTP protocol, and discuss TTP characteristics and compare its performance with that of other protocols proposed for control applications.

Automatic Reconnguration in the Presence of Failures

We describe a new kind of distributed system service, the Availability Management service, responsible for ensuring that the critical services of a distributed system remain continuously available to

Implementing fault-tolerant services using the state machine approach: a tutorial

TLDR
The state machine approach is a general method for implementing fault-tolerant services in distributed systems and protocols for two different failure models—Byzantine and fail stop are described.

Understanding fault-tolerant distributed systems

This article attempts to introduce some discipline and order in understanding fault-tolerance issues in distributed system architectures. This article examines various proposals, discusses their

Efficient fault-tolerant broadcasts

Reaching agreement on processor-group membrship in synchronous distributed systems

TLDR
Three simple protocols are proposed that provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays.

The Delta-4 approach to dependability in open distributed computing systems

TLDR
The authors present the overall Delta-4 framework for open, fault-tolerant, distributed computing systems and sketch the current implementation, which is based on a local area network with specific atomic multicasting and error-processing protocols for communicating between replicated software components.

The Consensus Problem in Unreliable Distributed Systems (A Brief Survey)

TLDR
The considerable literature on this problem that has developed over the past few years is surveyed and an informal overview of the major theoretical results is given.

Improving fault-tolerance in distributed systems: The saturation approach

  • J. Fabre
  • Computer Science
    [1991] Proceedings, Advanced Computer Technology, Reliable Systems and Applications
  • 1991
TLDR
The technique that maximizes the redundancy level of tasks and tolerates hardware faults by majority voting in the context of a pool of interconnected processors, called saturation, is presented and briefly compared with similar approaches.

Design of Highly Decentralized Operating Systems

TLDR
In the decentralized operating system approach all the resources of a distributed system are considered as belonging to a single, very reliable, machine, and a methodology able to express high parallelism in resource management policies is needed.
...

References

SHOWING 1-10 OF 20 REFERENCES

Fault-Tolerant Broadcasts

Impossibility of distributed consensus with one faulty process

TLDR
It is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process, in the asynchronous consensus problem.

Synchronization in Distributed Programs

TLDR
The technique can be used to solve synchronization problems directly, to implement new synchronization mechanisms, and to construct distributed versions of existing synchronization mechanisms.

Time, clocks, and the ordering of events in a distributed system

TLDR
A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events, and a bound is derived on how far out of synchrony the clocks can become.

The Byzantine Generals Strike Again

  • D. Dolev
  • Computer Science
    J. Algorithms
  • 1982

The notions of consistency and predicate locks in a database system

TLDR
It is argued that a transaction needs to lock a logical rather than a physical subset of the database, and an implementation of predicate locks which satisfies the consistency condition is suggested.

A Lower Bound for the Time to Assure Interactive Consistency

Simple and efficient Byzantine generals algorithm

TLDR
An explicit solution is given for a binary value among n=3t+1 processes, using 2t+4 rounds and o(t/sup 3/ log t) message bits, where t bounds the number of faulty processes.

Authenticated Algorithms for Byzantine Agreement

TLDR
This paper presents algorithms for reaching agreement based on authentication that require a total number of messages sent by correctly operating processors that is polynomial in both t and the number of processors, n.

Reaching Agreement in the Presence of Faults

TLDR
It is shown that the problem is solvable for, and only for, n ≥ 3m + 1, where m is the number of faulty processors and n is the total number and this weaker assumption can be approximated in practice using cryptographic methods.