Learn More
In loosely coupled distributed systems subject to random communication delays and component failures, atomic brocrdcart protocols can be used to implement the abstraction of a A-common sfomge, a replicated storage that displays at any clock time the same contents to every correct processor and that requires A time units to complete replicated updates. We(More)
The rst part of this paper provides rigorous deenitions for several basic concepts underlying the design of dependable programs, such as speciication, program semantics, exception, program correctness, robustness, failure, fault, and error. The second part investigates what it means to handle exceptions in modular programs structured as hierarchies of data(More)
We present D<sc>ATUM</sc>, a novel method for tolerating multiple disk failures in disk arrays. D<sc>ATUM</sc> is the first known method that can mask any given number of failures, requires an optimal amount of redundant storage space, and spreads reconstruction accesses uniformly over disks in the presence of failures without needing large layout tables in(More)
Atomic broadcast ensures that concurrent updates to the state of a process group are consistently delivered to all group members despite random communication delays and failures. By relieving replicated application programmers from the burden of dealing with the diicult issue of maintaining replica state consistency, atomic broadcast is a fundamental(More)
Fortress is a support system for designing and implementing fault-tolerant distributed real-time systems that use commercial of the shelf (COTS) components. The main problem we address in Fortress is that services cannot always provide their standard properties due the possibility of missed deadlines, dropped messages and process crashes. Fortress allows(More)
Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group(More)
We i n troduce the timed asynchronous distributed system model to describe existing asynchronous distributed systems subject to unbounded processing and communication delays, failures and recoveries. We then describe ve increasingly strong speciica-tions for processor-group membership services in timed asynchronous systems subject to partitioning. We also(More)