Fault Tolerant Operating Systems

  title={Fault Tolerant Operating Systems},
  author={Peter J. Denning},
  journal={ACM Comput. Surv.},
  • P. Denning
  • Published 1 December 1976
  • Computer Science
  • ACM Comput. Surv.
This paper develops four related architectural principles which can guide the construction of error-tolerant operating systems. The fundamental principle, system closure, specifies that no action is permissible unless explicitly authorized. The capability based machine is the most efficient known embodiment of this principle: it allows efficient small access domains, multiple domain processes without a privileged mode of operation, and user and system descriptor information protected by the… 

Figures from this paper

Design and principles of a fault tolerant system
This paper will discuss the determination of a global system architecture by a top-down approach and the principles of protection by capability and management of synchronization and object sharing between processes, by generalized monitors and path expression in this hierarchized system.
An approach to a fault-tolerant system architecture
The principles of domains and the architecture of a capability machine are discussed and management of scheduling and object sharing between processes, by monitors is detailed.
Data Security: The 1100/90 as a Closed, Fault-Tolerant Environment
This paper presents an overview of the protection mechanisms built into the 1100/90 and how they implement a fault-tolerant, closed environment and specifies the problem in terms of current expectations of reliability and security.
Distributed system fault tolerance using message logging and checkpointing
A new optimistic message logging system is presented that guarantees to find the maximum possible recoverable system state, which is not ensured by previous optimistic methods.
Building a reliable operating system
CuriOS is presented, an operating system that incorporates several new error management techniques that significantly improve reliability and achieves inter-client isolation by curtailing error propagation within services.
The introduction of fault-tolerance in a hierarchical operating system
A general method for introducing fault-tolerance in a hierarchical operating system is presented here, such that known techniques for fault-Tolerant operations can be represented as particular cases.
Design and implementation of a resource-secure system
It is proved that building resource-secure systems is pos-sible by describing the design and implementation of the prototype, Anaxagoros, and proposing several novel ways to solve synchronization problems.
Building a Self-Healing Operating System
  • Francis M. DavidR. Campbell
  • Computer Science
    Third IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC 2007)
  • 2007
F Fault injection experiments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases and individual process recovery can be attempted as a last resort.
Resourceful systems for fault tolerance, reliability, and safety
The current state of the art of system reliability, safety, and fault tolerance is reviewed, and an approach to designing resourceful systems based upon a functionally rich architecture and an explicit goal orientation is developed.
4 - Operating Systems


Dynamic protection structures
This paper deals with one aspect of the subject, which might be called the meta-theory of protection systems: how can the information which specifies protection and authorizes access, itself be protected and manipulated.
System structure for software fault tolerance
  • B. Randell
  • Computer Science
    IEEE Transactions on Software Engineering
  • 1975
The aim is to facilitate the provision of dependable error detection and recovery facilities which can cope with errors caused by residual design inadequacies, particularly in the system software, rather than merely the occasional malfunctioning of hardware components.
A hardware architecture for implementing protection rings
Hardware processor mechanisms for implementing concentric rings of protection that allow cross-ring calls and subsequent returns to occur without trapping to the supervisor are described.
The protection of information in computer systems
This tutorial paper explores the mechanics of protecting computer-stored information from unauthorized use or modification by examining in depth the principles of modern protection architectures and the relation between capability systems and access control list systems.
Dynamic verification of operating system decisions
The dynamic verification of operating system decisions is used on the PRIME system to ensure that one user's information cannot become available to another user gratuitously even in the presence of a single hardware or software fault.
HYDRA: the kernel of a multiprocessor operating system
This paper describes the design philosophy of HYDRA—the kernel of an operating system for C.mmp, the Carnegie-Mellon Multi-Mini-Processor. This philosophy is realized through the introduction of a
A Computer Architecture for Level Structured Systems
The purpose of this paper is to point out where the hardware support is needed and to suggest one way of implementing these features, as well as providing some simple mechanisms in the hardware to support the level structure.
Process Structuring
This work has indicated how very complex processes could be obtained from simple ones by combination, and noted that their separate correctness was insufficient to guarantee correctness of the combination.
Capability-based addressing
A computer using capability-based addressing may be substantially superior to present systems on the basis of protection, simplicity of programming conventions, and efficient implementation.
The multics system: an examination of its structure
The author builds a picture of the life of a process in coexistence with other processes, and suggests ways to model or construct subsystems that are far more complex than could be implemented using predecessor computer facilities.