A new adaptive accrual failure detector for dependable distributed systems

@inproceedings{Satzger2007ANA,
  title={A new adaptive accrual failure detector for dependable distributed systems},
  author={Benjamin Satzger and Andreas Pietzowski and Wolfgang Trumler and Theo Ungerer},
  booktitle={SAC '07},
  year={2007}
}
The detection of failures in distributed environments is a crucial part for developing dependable, robust, and self-healing systems. The contribution of this paper is a new failure detection algorithm that can be described as an adaptive accrual algorithm coupled with features to increase flexiblity and decrease computation costs. Furthermore our evaluation results show a very good detection quality in the case of message losses. 
Low-Overhead Accrual Failure Detector
TLDR
An new accrual failure detector—LA-FD with low system overhead has been proposed specifically for current mobile network equipment on the Internet whose processing power, memory space and power supply are all constrained.
Autonomous and scalable failure detection in distributed systems
TLDR
Algorithms to form monitoring relations are introduced and proposed to utilise these for a scalable autonomous failure detection and the evaluation of the developed algorithms indicates that they are suitable for complex, large scale and distributed systems.
LA-FD : a Low-overhead Accrual Failure Detector ?
Failure detector is one of the fundamental components for building a distributed system with high availability. In order to maintain the efficiency and scalability of failure detection in a
An Adaptive Performance Management Method for Failure Detection
  • Ke Liang, Xingshe Zhou, K. Zhang, Ruiqing Sheng
  • Computer Science
    2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
  • 2008
TLDR
An adaptive performance management method based on feedback control theory that allows the system to self-manage the CPU resources allocated for failure detection, with only the high-level policy input, which limits the performance impact due to the execution of failure Detection, even in the dynamic environment.
Grouping algorithms for scalable self-monitoring distributed systems
TLDR
This paper proposes three algorithms which allow a distributed system to install monitoring relations among its components and serves as a basis to build scalable distributed systems with self-x features and to achieve a self-monitoring capability.
Grouping algorithms for scalable self-monitoring distributed systems
TLDR
This paper proposes three algorithms which allow a distributed system to install monitoring relations among its components and serves as a basis to build scalable distributed systems with self-x features and to achieve a self-monitoring capability.
A literature review of failure detection Within the context of solving the problem of distributed consensus
TLDR
This essay presents the theoretical models that allow us to solve consensus in the presence of even one failure, and discusses practical refinements to the models for the purposes of implementing failure detectors in practice.
Neighbour replica affirmative adaptive failure detection and autonomous recovery
High availability is an important property for current distributed systems. The trends of current distributed systems such as grid computing and cloud computing are the delivery of computing as a
Self healing distributed systems
TLDR
A new failure detection algorithm is proposed with noteworthy features like a high flexibility and good performance, and an approach is presented to save the message overhead of failure detectors.
Variations and Evaluations of an Adaptive Accrual Failure Detector to Enable Self-healing Properties in Distributed Systems
TLDR
Variations of the proposed basic algorithm to improve its performance are introduced and an evaluation of all algorithms using message delay and loss models of the internet is provided.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 13 REFERENCES
Implementation and performance evaluation of an adaptable failure detector
TLDR
This paper proposes a new implementation of a failure detector which is adaptable and can support scalable applications, and dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time and an adaptation of the quality of service according to application needs.
Unreliable failure detectors for reliable distributed systems
TLDR
It is proved that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast.
An adaptive failure detection protocol
TLDR
A relatively simple protocol that allows a process to "monitor" another process, and consequently to detect its crash, and which uses control messages only when no application message is sent by the monitoring process to the observed process.
The φ Accrual Failure Detector
TLDR
This paper presents a novel abstraction, called accrual failure dete ctors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failu re detectors in distributed systems.
Impossibility of distributed consensus with one faulty process
TLDR
In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process.
The weakest failure detector for solving consensus
TLDR
It is proved that to solve Consensus, any failure detector has to provide at least as much information as Diamond W, which is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
A hundred impossibility proofs for distributed computing
TLDR
Although it is often hard to say what constitutes ad different results, the author managed to count over 100 such impossibility proofs and found that it's not quite as hopeless to understand this area as it might seem from the number of papers.
The /spl phi/ accrual failure detector
TLDR
This paper presents a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems.
The f accrual failure detector. In SRDS, pages 66–78
  • IEEE Computer Society,
  • 2004
On the Quality of Service of Failure Detectors
...
1
2
...