Learn More
This paper proposes a distributed algorithm by which a collection of mobile robots roaming on a plane move to form a circle. The algorithm operates under the premises that robots (1) are unable to recall past actions and observations (i.e., oblivious), (2) cannot be distinguished from each others (i.e., anonymous), (3) share no common sense of direction,(More)
Detecting failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far. One of the reasons is the difficulty to satisfy(More)
Designing, tuning, and analyzing the performance of distributed algorithms and protocols are complex tasks. A major factor that contributes to this complexity is the fact that there is no single environment to support all phases of the development of a distributed algorithm. This paper presents Neko, an easy-to-use Java platform that provides a uniform and(More)
One of the fundamental differences between a centralized system and a distributed one is the notion of partial failures. The ability to efficiently and accurately detect failures is a key element underlying reliable distributed computing. In current distributed systems however, failure detection is either left to the application developer or hidden from the(More)
For many years, people have been advocating the development of failure detection as a basic service, but, unfortunately, without meeting much success so far. We believe that this comes from the fact that important system engineering issues have not yet been addressed adequately, thus preventing the definition of a truly generic service. Ultimately, our goal(More)
It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the main issues related to ensuring failure detection in large-scale systems, and overview(More)
This paper brings the following three main contributions: a hierarchy of specifications for replication techniques , semi-passive replication, and Lazy Consensus. Based on the definition of the Generic Replication problem, we define two families of replication techniques: replication with parsimonious processing (e.g., passive replication), and replication(More)
This paper presents the semi-passive replication technique – a variant of passive replication – that can be implemented in the asynchronous system model without requiring a membership service to agree on a primary. Passive replication is a popular replication technique since it can tolerate non-deterministic servers (e.g., multi-threaded servers) and uses(More)