Corpus ID: 62065867

Crash Recovery in a Distributed Data Storage System

  title={Crash Recovery in a Distributed Data Storage System},
  author={Butler W. Lampson and Howard E. Sturgis},
An algorithm is described which guarantees reliable storage of data in a distributed system, even when different portions of the data base, stored on separate machines, are updated as part of a single transaction. The algorithm is implemented by a hierarchy of rather simple abstractions, and it works properly regardless of crashes of the client or servers. Some care is taken to state precisely the assumptions about the physical components of the system (storage, processors and communication). 
Implementation and performance of a stable-storage service in Unix
This paper describes the design, implementation, and performance of a stable-storage service that has been implemented on top of the Unix operating system. This service allows servers to create,Expand
A Crash Recovery Scheme for a Memory-Resident Database System
  • R. Hagmann
  • Computer Science
  • IEEE Transactions on Computers
  • 1986
This correspondence presents a method of performing crash recovery for database systems designed to provide fast transaction processing, to effectively use multiple processors, and to perform a fast restart after a crash. Expand
Transactions and consistency in distributed database systems
It is shown that a distributed system can be modeled as a single sequential execution sequence and this model is used to discuss simple techniques for implementing the various forms of transparency. Expand
Resilient Extended True-Copy Token Scheme for a Distributed Database System
A new resiliency scheme for a distributed database system with replicated data that does not employ a log subsystem is presented and can be used for a highly reliable system that must tolerate a total crash of a site. Expand
Transactions and synchronization in a distributed operating system
A fully distributed operating system transaction facility with fine-grain record level synchronization is described, done in the context of Locus, a high performance distributed Unix operating system for local area networks. Expand
The recovery manager of a data management system
The recovery subsystem of System R, an experimental data management system, describes and evaluates the transaction concept which allows application programs to commit and the DO-UNDO-REDO protocol allows new recoverable types and operations to be added to the recovery system. Expand
Concurrency Control Mechanism for an Available Distributed Data Base System
The structure of the system and the algorithms are designed to allow transaction processing to proceed in case of one site failure, and the required network functionalities, the concurrency control and transaction atomicity problems are focused on. Expand
The LOCUS distributed operating system
The complete system architecture is outlined in this paper, and that experience in its use has been summarized. Expand
Disconnected Operation in a Distributed File System
  • J. J. Kistler
  • Computer Science
  • Lecture Notes in Computer Science
  • 1995
This work presents the important new technique called disconnected operation, in which clients mask failures and voluntary network detachments by emulating the functionality of servers where actual server-oriented solutions are inadequate. Expand
A client-based transaction system to maintain data integrity
The paper gives a detailed description of how consistent, atomic transactions can be implemented by client processes communicating with one or more file server computers. Expand


The notions of consistency and predicate locks in a database system
It is argued that a transaction needs to lock a logical rather than a physical subset of the database, and an implementation of predicate locks which satisfies the consistency condition is suggested. Expand
Notes on Data Base Operating Systems
  • J. Gray
  • Computer Science
  • Advanced Course: Operating Systems
  • 1978
This paper is a compendium of data base management operating systems folklore and focuses on particular issues unique to the transaction management component especially locking and recovery. Expand
Monitors: an operating system structuring concept
This paper develops Brinch-Hansen's concept of a monitor as a method of structuring an operating system. It introduces a form of synchronization, describes a possible method of implementation inExpand
Verifying properties of parallel programs
An axiomatic method for proving a number of properties of parallel programs is presented. Hoare has given a set of axioms for partial correctness, but they are not strong enough in most cases. This...
An axiomatic basis for computer programming
An attempt is made to explore the logical foundations of computer programming by use of techniques which were first applied in the study of geometry and have later been extended to other branches of mathematics by elucidation of sets of axioms and rules of inference. Expand