The process group approach to reliable distributed computing

  title={The process group approach to reliable distributed computing},
  author={Kenneth P. Birman},
  journal={Commun. ACM},
  • K. Birman
  • Published 1 September 1992
  • Computer Science
  • Commun. ACM
Abstract : The difficulty of developing reliable distributed software is an impediment to applying distributed computing technology in many settings. Experience with the ISIS system suggests that a structured approach based on virtually synchronous process groups yields systems that are substantially easier to develop, exploit sophisticated forms of cooperative computation and achieve high reliability. This paper reviews six years of on ISIS, describing the model, its implementation challenges… 

Figures from this paper

On group communication in large-scale distributed systems

The process group mechanism is considered as an appropriate application structuring paradigm in such large-scale distributed systems and a formal characterization for the attribute "large scale" as applied to distributed systems is given.

Construction and management of highly available services in open distributed systems

A novel replication protocol is presented that satisfies two fundamental requirements of this environment: first, it hides replication from the service clients and secondly, it facilitates the dynamic reconfiguration of the server group.

System Support for Distributed Computing

An experimental toolkit which allows for object-oriented programming of distributed, failure-resilient applications is presented, and it is compared to a compute intensive application implemented on Electra, on PVM, on a transputer, and on two different Linda systems.

Distributed object replication in a cluster of workstations

  • Wanlei ZhouL. Wang
  • Computer Science
    Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region
  • 2000
An object-oriented design pattern is focused on to simplify the design and implementation of distributed replications of reliable and efficient services on a cluster of workstations.

Group-based distributed computing: Programming and distributed platform model.

This chapter describes the subject of the thesis, beginning with the discussion of the background and motivation that led to the research and the material presented in the thesis. The focus of the

An architecture for building reliable distributed object-based systems

  • Li WangWanlei Zhou
  • Computer Science, Engineering
    Proceedings. Technology of Object-Oriented Languages. TOOLS 24 (Cat. No.97TB100240)
  • 1997
The proposed architecture attempts to bring advances in client-server, remote procedure call, reliable group communication, and object orientation technologies under a unified architecture to ease application developers' work of building reliable distributed systems.

An Agreement Service for Implementing Fault Tolerant Distributed Software

An agreement service build on top of a group communication layer which allows distributed applications to reach agreement is proposed which facilitates the development of FTDS by implementing agreement protocols transparently to the application programmer.

The group approach in cooperative work and in load balancing

The value of partitioning into groups in two techniques used in distributed systems: cooperative work, a user application and load balancing which is a system application is measured.

Using Group Communication Technology to Implement a Reliable andScalable Distributed IN Coprocessor

This work suggests that group communication technology can bring substantial bene ts including scalability fault tolerance and real time responsiveness to the most demanding telecommunications applications.



Exploiting virtual synchrony in distributed systems

It is argued that this approach to building distributed and fault-tolerant software is more straightforward, more flexible, and more likely to yield correct solutions than alternative approaches.

Using process groups to implement failure detection in asynchronous environments

A rigorous, formal specification for group membership is presented under this interpretation and a solution is presented for this problem as it relates to failure detection in asynchronous, distributed systems.

Exploiting replication in distributed systems

Techniques are examined for replicating data and execution in directly distributed systems: systems in which multiple processes interact directly with one another while continuously respecting

The ISIS project: real experience with a fault tolerant programming system

The ISIS project has developed a distributed programming toolkit[2,3] and a collection of higher level applications based on these tools. ISIS is now in use at more than 300 locations world-wide.

The Use of Efficient Broadcast Protocols in Asynchronous Distributed Systems

This dissertation presents techniques for deciding how strongly ordered a protocol is necessary to solve a given application problem and introduces the concept of a linearization function that maps partially ordered sets of events to totally ordered histories.

An efficient reliable broadcast protocol

This paper presents a (software) protocol that simulates reliable broadcast, even on an unreliable network, and is more efficient than previously published reliable broadcast protocols.

The many faces of consensus in distributed systems

The goal is to give practitioners some sense of the system hardware and software guarantees that are required to achieve a given level of reliability and performance.

Low cost management of replicated data in fault-tolerant distributed systems

A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated, which results in better response time when performing operations on replicated data.

Integrating security in a group oriented distributed system

  • M. ReiterK. BirmanL. Gong
  • Computer Science
    Proceedings 1992 IEEE Computer Society Symposium on Research in Security and Privacy
  • 1992
A distributed security architecture is proposed for incorporation into group oriented distributed systems, and in particular, into the Isis distributed programming toolkit to make common group-oriented abstractions robust in hostile settings in order to facilitate the construction of high-performance distributed applications that can tolerate both component failure and malicious attacks.

Replication and fault-tolerance in the ISIS system

Techniques for obtaining a fault-tolerant implementation from a non-distributed specification and for achieving improved performance by concurrently updating replicated data are discussed.