Distributed snapshots: determining global states of distributed systems

@article{Chandy1985DistributedSD,
  title={Distributed snapshots: determining global states of distributed systems},
  author={K. Mani Chandy and Leslie Lamport},
  journal={ACM Trans. Comput. Syst.},
  year={1985},
  volume={3},
  pages={63-75}
}
This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties… 

Figures from this paper

A Paradigm for Detecting Quiescent Properties in Distributed Computations
TLDR
This paper presents a simple (almost trivial) algorithm to detect quiescent properties, an important class of stable properties including those mentioned above.
Strong stable properties in distributed systems
TLDR
This paper presents a very simple algorithm for termination detection and deadlock detection and shows how to derive a simple generic algorithm for the detection of a strong stable property.
A Consistent Global Checkpoint Algorithm for Distributed Systems with a Forbidden Process
TLDR
A distributed coordinated checkpointing algorithm for distributed systems with a special process, called a forbidden process, is discussed, which takes the minimum number of checkpoints in the forbidden process.
Detection of stable properties in distributed applications
TLDR
This paper exposes a general algorithm for the distributed detection of stable properties in distributed applications or systems that deals with every stable property of a fairly general class.
Monitoring Stable Properties in Dynamic Asynchronous Distributed Systems ∗
TLDR
This paper presents an efficient algorithm to determine whether a stable property has become true in a system in which processes can join and depart the system at any time, based on maintaining a spanning tree of processes that are currently part of the system.
Independent global snapshots in large distributed systems
TLDR
This work provides exact conditions for an arbitrary checkpoint based on independent dependency tracking within clusters of nodes that permits nodes to independently compute dependency information based on available (local) information.
Independent global snapshots in large distributed systems
TLDR
This work provides exact conditions for an arbitrary checkpoint based on independent dependency tracking within clusters of nodes, which permits nodes to independently compute dependency information based on available (local) information.
A distributed consistent global checkpoint algorithm for distributed mobile systems
  • Yoshifumi Manabe
  • Computer Science
    Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001
  • 2001
TLDR
A checkpoint algorithm in which the amount of information piggybacked on program messages does not depend on the number of mobile processes and is optimal among the generalizations of Chandy and Lamport's distributed snapshot algorithm under the latter assumption.
Multiple Distributed Checkpoints over Unreliable Channels
TLDR
This work presents a solution that does not require links to preserve message order and reduces the problem of handling messages in transit at the time of a checkpoint to one of dealing with messages lost in transit, which the communication system is capable of doing.
Monitoring Stable Properties in Dynamic Peer-to-Peer Distributed Systems
TLDR
This paper presents an efficient algorithm to determine whether a stable property has become true in a system in which processes can join and depart the system at any time, and it can be used to evaluate any stable property.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
On Deadlock Detection in Distributed Systems
TLDR
The distributed protocol for deadlock detection in distributed databases is incorrect, and possible remedies are presented, however, the distributed protocol remains impractical because "condensations" of "transaction-wait-for" graphs make graph updates difficult to perform.
Locking and Deadlock Detection in Distributed Data Bases
TLDR
Two protocols for the detection of deadlocks in distributed data bases are described–a hierarchically organized one and a distributed one that requires that the global graph be built and maintained in order for deadlocks to be detected.
Distributed deadlock detection algorithm
TLDR
An algorithm for detecting deadlocks among transactions running concurrently in a distributed processing network (i.e., a distributed database system) and a proof of the correctness of the distributed portion of the algorithm is given.
Time, clocks, and the ordering of events in a distributed system
TLDR
A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events, and a bound is derived on how far out of synchrony the clocks can become.
Distributed deadlock detection
TLDR
It is shown that all true deadlocks are detected and that no false deadlock reported, and the algorithms can be applied in distributed database and other message communication systems.
Software Controlled Access to Distributed Data Bases
TLDR
A modified control scheme is proposed, in which the centre of control can shift its location to adapt to failure of network elements, and it is expected that such an adaptive controller will prove superior in performance to either of the previous alternatives.
The distributed snapshot of K.M. Chandy and L. Lamport
We consider a distributed system of the form of a strongly connected, finite, directed graph, of which each vertex is a machine and each edge a uni-directional first-in-first-out buffer of sufficient
Termination Detection of Diffusing Computations in Communicating Sequential Processes
TLDR
In th i s p a p e r, h a v e i n t r o d u c e d t h e n o t i o n of di f fusing computat ion in a d i s t r i b u t e d s y s t e m of p r o c e s s e s a n d sugges t a n e l e g a n t a l g o r i t h m.
Distributed computation on graphs: shortest path algorithms
TLDR
This work presents a detailed solution to the problem of computing shortest paths from a single vertex to all other vertices, in the presence of negative cycles.
Market Feedbacks and the Limits to Growth
Critics of the Forrester-Meadows models; of population and economic growth limits have focused their attention on the excessive aggregation of the model, on its exceedingly conservative assumptions
...
...