Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

@article{Renesse2003AstrolabeAR,
  title={Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining},
  author={Robbert van Renesse and Kenneth P. Birman and Werner Vogels},
  journal={ACM Trans. Comput. Syst.},
  year={2003},
  volume={21},
  pages={164-206}
}
Scalable management and self-organizational capabilities areemerging as central requirements for a generation of large-scale,highly dynamic, distributed applications. We have developed anentirely new distributed information management system calledAstrolabe. Astrolabe collects large-scale system state, permittingrapid updates and providing on-the-fly attribute aggregation. Thislatter capability permits an application to locate a resource, andalso offers a scalable way to track system state as… 
A scalable information management middleware for large distributed systems
TLDR
This dissertation presents a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large- scale distributed applications by providing detailed views of nearby information and summary views of global information.
MON: Design and Implementation of Management Overlay Networks for Distributed Systems
TLDR
The concept of MONs addresses concerns and opinions raised by both peer-to-peer networking and Grid computing communities in the recent past, and their simplicity, scalability, lightweight-ness and fault-tolerance enable them to run side-by-side with existing distributed applications.
DObjects : A Metacomputing Framework with Dynamic Query Processing for Distributed Data Networks
TLDR
DObjects, a general-purpose distributed data objects framework that provides an easy and scalable way of querying and operating data from heterogeneous data sources, and the details of the dynamic query execution engine within the metacomputing framework that dynamically adapts to network and node conditions are presented.
The One Minute Manager: Lightweight On-Demand Overlays for Distributed Application Management
TLDR
The utility of MON is demonstrated by showing how it can be used to query the aggregate state of a real application deployed in a real world environment by way of an end-to-end response time of just a couple of seconds.
Scalable Management and Data Mining Using Astrolabe
TLDR
This paper focuses on wide- area implementation challenges of Astrolabe, a new kind of peer-to-peer system implementing a hierarchical distributed database abstraction that can also support wide-area multicast and offers powerful aggregation mechanisms.
DMake : A Tool for Monitoring and Configuring Distributed Systems
TLDR
DMake is unusual in offering a very flexible range of management options through a familiar and widely popular model: that of the Unix make utility, and its extended “makefile” format can express complex policies.
Design and Implementation of a Scalable Network Monitoring System
TLDR
This work presents a monitoring system that scale to over 100000 nodes, has minimal local and global overhead, and maintains integrity in the face of transient network failure, and includes a web service interface, which allows access to the system via HTTP.
A scalable distributed information management system
TLDR
This work designs, implements and evaluates a Scalable Distributed Information Management System (SDIMS) that leverages Distributed Hash Tables (DHT) to create scalable aggregation trees, achieves isolation properties at the cost of modestly increased read latency in comparison to flat DHTs, and gracefully handles failures.
Moara: Flexible and Scalable Group-Based Querying System
TLDR
Moara is presented, a new querying system that makes two novel contributions: first, Moara builds aggregation trees for different groups and adaptively maintains the trees to optimize the total message cost, and secondly Moara supports a query language allowing groups to be specified implicitly via predicates consisting of arbitrarily nested unions and intersections.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
INS/Twine: A Scalable Peer-to-Peer Architecture for Intentional Resource Discovery
TLDR
The design, implementation, and evaluation of INS/Twine is described, an approach to scalable intentional resource discovery, where resolvers collaborate as peers to distribute resource information and to resolve queries.
The Information Bus: an architecture for extensible distributed systems
TLDR
The Information Bus, the solution, is a novel synthesis of four design principles: core communication protocols have minimal semantics, objects are self-describing, types can be dynamically defined, and communication is anonymous.
Pastry: Scalable, distributed object location and routing for large-scale peer-to-
TLDR
Experimental results obtained with a prototype implementa tion on a simulated network of up to 100,000 nodes confirm Pastry’s scalability, its ability to selfconfigure and adapt to node failures, and its good network loc ality properties.
Exploiting virtual synchrony in distributed systems
TLDR
It is argued that this approach to building distributed and fault-tolerant software is more straightforward, more flexible, and more likely to yield correct solutions than alternative approaches.
A Weak-Consistency Architecture for Distributed Information Services
TLDR
This architecture implements the replicas as a weak-consistency process group, which provides good scalability and availability, handles portable computer systems, and minimizes the effect of users on each other.
The design and implementation of an intentional naming system
TLDR
The design and implementation of the Intentional Naming System (INS), a resource discovery and service location system for dynamic and mobile networks of devices and computers, is presented and three applications are described that demonstrate the feasibility and utility of INS.
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
TLDR
Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry's scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.
Designing a global name service
TLDR
The global name service described here is meant to do this for billions of names distributed throughout the world and addresses the problems of high availability, large size, continuing evolution, fault isolation and lack of global trust.
Design and evaluation of a wide-area event notification service
TLDR
SIENA, an event notification service that is designed and implemented to exhibit both expressiveness and scalability, is presented and the service's interface to applications, the algorithms used by networks of servers to select and deliver event notifications, and the strategies used to optimize performance are described.
Matching events in a content-based subscription system
TLDR
It is proved that for predicates reducible to conjunctions of elementary tests, the expected time to match a random event is no greater than O(N 1 ) where N is the number of subscriptions, and is a closed-form expression that depends on the number and type of attributes.
...
...