Fault Tolerant Operating Systems

@article{Denning1976FaultTO,
  title={Fault Tolerant Operating Systems},
  author={Peter J. Denning},
  journal={ACM Comput. Surv.},
  year={1976},
  volume={8},
  pages={359-389}
}
  • P. Denning
  • Published 1976
  • Computer Science
  • ACM Comput. Surv.
This paper develops four related architectural principles which can guide the construction of error-tolerant operating systems. The fundamental principle, system closure, specifies that no action is permissible unless explicitly authorized. The capability based machine is the most efficient known embodiment of this principle: it allows efficient small access domains, multiple domain processes without a privileged mode of operation, and user and system descriptor information protected by the… Expand
Design and principles of a fault tolerant system
TLDR
This paper will discuss the determination of a global system architecture by a top-down approach and the principles of protection by capability and management of synchronization and object sharing between processes, by generalized monitors and path expression in this hierarchized system. Expand
An approach to a fault-tolerant system architecture
TLDR
The principles of domains and the architecture of a capability machine are discussed and management of scheduling and object sharing between processes, by monitors is detailed. Expand
Data Security: The 1100/90 as a Closed, Fault-Tolerant Environment
TLDR
This paper presents an overview of the protection mechanisms built into the 1100/90 and how they implement a fault-tolerant, closed environment and specifies the problem in terms of current expectations of reliability and security. Expand
Distributed system fault tolerance using message logging and checkpointing
TLDR
A new optimistic message logging system is presented that guarantees to find the maximum possible recoverable system state, which is not ensured by previous optimistic methods. Expand
The introduction of fault-tolerance in a hierarchical operating system
TLDR
A general method for introducing fault-tolerance in a hierarchical operating system is presented here, such that known techniques for fault-Tolerant operations can be represented as particular cases. Expand
Building a reliable operating system
TLDR
CuriOS is presented, an operating system that incorporates several new error management techniques that significantly improve reliability and achieves inter-client isolation by curtailing error propagation within services. Expand
Design and implementation of a resource-secure system
TLDR
It is proved that building resource-secure systems is pos-sible by describing the design and implementation of the prototype, Anaxagoros, and proposing several novel ways to solve synchronization problems. Expand
Building a Self-Healing Operating System
  • Francis M. David, R. Campbell
  • Computer Science
  • Third IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC 2007)
  • 2007
TLDR
F Fault injection experiments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases and individual process recovery can be attempted as a last resort. Expand
Building a Self-Healing Operating System
TLDR
F Fault injection experiments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases and individual process recovery can be attempted as a last resort. Expand
Resourceful systems for fault tolerance, reliability, and safety
TLDR
The current state of the art of system reliability, safety, and fault tolerance is reviewed, and an approach to designing resourceful systems based upon a functionally rich architecture and an explicit goal orientation is developed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 85 REFERENCES
Dynamic protection structures
TLDR
This paper deals with one aspect of the subject, which might be called the meta-theory of protection systems: how can the information which specifies protection and authorizes access, itself be protected and manipulated. Expand
System structure for software fault tolerance
TLDR
The aim is to facilitate the provision of dependable error detection and recovery facilities which can cope with errors caused by residual design inadequacies, particularly in the system software, rather than merely the occasional malfunctioning of hardware components. Expand
Formal requirements for virtualizable third generation architectures
TLDR
The hardware architectural requirements for virtual machine systems are discussed and a fairly specific definition of a virtual machine is presented which includes the aspects of efficiency, isolation, and identical behavior. Expand
System structure for software fault tolerance
  • B. Randell
  • Computer Science
  • IEEE Transactions on Software Engineering
  • 1975
TLDR
The aim is to facilitate the provision of dependable error detection and recovery facilities which can cope with errors caused by residual design inadequacies, particularly in the system software, rather than merely the occasional malfunctioning of hardware components. Expand
A hardware architecture for implementing protection rings
TLDR
Hardware processor mechanisms for implementing concentric rings of protection that allow cross-ring calls and subsequent returns to occur without trapping to the supervisor are described. Expand
The protection of information in computer systems
TLDR
This tutorial paper explores the mechanics of protecting computer-stored information from unauthorized use or modification by examining in depth the principles of modern protection architectures and the relation between capability systems and access control list systems. Expand
Dynamic verification of operating system decisions
TLDR
The dynamic verification of operating system decisions is used on the PRIME system to ensure that one user's information cannot become available to another user gratuitously even in the presence of a single hardware or software fault. Expand
A verifiable protection system
TLDR
The design and implementation of the UCLA Virtual Machine System, a multiuser operating system base that has been developed to provide ultra high reliability protection and security, are reported on. Expand
Protection in the Hydra Operating System
This paper describes the capability based protection mechanisms provided by the Hydra Operating System Kernel. These mechanisms support the construction of user-defined protected subsystems,Expand
HYDRA: the kernel of a multiprocessor operating system
This paper describes the design philosophy of HYDRA—the kernel of an operating system for C.mmp, the Carnegie-Mellon Multi-Mini-Processor. This philosophy is realized through the introduction of aExpand
...
1
2
3
4
5
...