Fault Management in Distributed Systems: A Policy-Driven Approach

@article{Lutfiyya2004FaultMI,
  title={Fault Management in Distributed Systems: A Policy-Driven Approach},
  author={Hanan Lutfiyya and Michael Anthony Bauer and Andrew D. Marshall and David K. Stokes},
  journal={Journal of Network and Systems Management},
  year={2004},
  volume={8},
  pages={499-525}
}
Managing the availability and performance of a distributed system involves monitoring the behavior of the system, identifying system problems, and correcting those problems. Each of these tasks requires some expertise, such as an understanding of the mechanics of the underlying system components. As the size and complexity of these systems increases, and the number of distributed applications executing on these systems increases, managing the availability and performance of distributed systems… 
A study of service reliability and availability for distributed systems
A Survey of Fault Management in Wireless Sensor Networks
TLDR
This paper summarizes and compares existing fault tolerant techniques to support sensor applications and discusses several interesting open research directions.
Dealing with Faults in Wireless Sensor Networks
TLDR
This paper summarizes and compares existing fault tolerant techniques to support sensor applications and discusses several interesting open research directions.
Management by Contract: IT Management driven by Business Objectives
In today’s business environment, change must be seen not as an exception but as the normal state of affairs. With its Darwin Architecture [23], HP addresses business agility, defined as the ability
BPMM: a grid-based architectural framework for business process meta management
TLDR
It is argued that traditional BPM platforms are not sufficient for dynamic and adaptive business environment; and a new paradigm on BPM needs to exist-the one proposed in this paper is called Business Process Meta Management (BPMM).
BPSM : An Adaptive Platform for Managing Business Process Solutions
TLDR
An adaptive platform, called BPSM (Business Process Solution Management), for managing business process solutions by creating an adaptive environment so that developers can leverage it to build management applications in the domain of business process solution management.
Fault Management For Service-Oriented Systems
TLDR
The Autobiographical Statement is intended to provide a chronology of the events leading up to and including the publication of the autobiography of Albert Camus.
Optimal Node Deployment for Fault Tolerant Wireless Sensor Networks : A Survey Ms
TLDR
This paper summarizes the techniques of optimal deployment of nodes in WSN and ways of dealing with faults developing in the network and discusses various reasons of faults that occur in W SNs.
Towards a Collection of Security and Privacy Patterns
TLDR
This work presents a survey and taxonomy of SP patterns towards the creation of a usable pattern collection, to enable decomposition of higher-level properties to more specific ones, matching them to relevant patterns, while also creating a comprehensive overview of security- and privacy-related properties and sub-properties that are of interest in IoT/IIoT environments.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Reference Architecture for Distributed Systems Management
TLDR
A reference architecture for distributed systems management is proposed that integrates system monitoring, information management, and system modeling techniques, and a detailed hospital application is presented to clarify the requirements for managing applications.
Making distributed applications manageable through instrumentation
TLDR
An instrumentation architecture to support this, a prototype implementation which includes a class library of standard instrumentation, and a methodology for instrumentation to allow them to respond to management requests, generate management reports, and maintain information required by the management system.
Services Supporting Management of Distributed Applications and Systems
TLDR
This paper presents a framework for management of distributed applications and systems based on a set of common management services that support management activities, which include monitoring, control, configuration, and data repository services.
Policies in network and systems management—Formal definition and architecture
  • R. Wies
  • Computer Science
    Journal of Network and Systems Management
  • 2005
TLDR
A formal definition of policies is presented and examples of policies from network and systems management are given to illustrate the concept and an architecture is introduced to show how a policy system can be realized.
Policies Hierarchies for Distributed Systems Management
TLDR
The paper explores the refinement of general high-level policies into a number of more specific policies to form a policy hierarchy in which each policy in the hierarchy represents, to its maker, his plans to meet his objectives and, to the subject, the objectives which he must plan to meet.
Efficient management data acquisition and run-time control of DCE applications using the OSI management framework
TLDR
A prototype management system is described which has been developed to explore the efficiency and dynamic control of management data acquisition and on the run-time control of application programs in the management of DCE applications.
A General Object Model for the Management of Distributed Applications
TLDR
This paper describes a generalized object model of distributed software applications that has and is being experimented with at Bellcore for managing its distributed application systems.
On a rule based management architecture
  • T. Koch, B. Kramer, G. Rohde
  • Computer Science
    Second International Workshop on Services in Distributed and Networked Environments
  • 1995
TLDR
A hybrid approach towards distributed systems management where the management policies are represented by rules and interpretation of the rules and automated activation of appropriate management tools is done by the software development environment Marvel.
Policy driven management for distributed systems
  • M. Sloman
  • Business
    Journal of Network and Systems Management
  • 2005
TLDR
This paper describes the work on policy which has come out of two related ESPRIT funded projects, SysMan and IDSM and shows how a number of example policies can be modeled using these objects and briefly mention issues relating to policy hierarchy and conflicts between overlapping policies.
Towards a Comprehensive Distributed Systems Management
TLDR
This paper describes a hybrid approach towards distributed systems management in at least two aspects: Formalized management policies are automatically enforced and standard management tasks can be delegated to the system.
...
...