Learn More
Given a set of machines and a set of Web applications with dynamically changing demands, an online application placement controller decides how many instances to run for each application and where to put them, while observing all kinds of resource constraints. This NP hard problem has real usage in commercial middleware products. Existing approximation(More)
Server virtualization opens up a range of new possibilities for autonomic datacenter management, through the availability of new automation mechanisms that can be exploited to control and monitor tasks running within virtual machines. This facilitates more powerful and flexible autonomic controls, through management software that maintains the system in a(More)
—We study the problem of dynamic resource allocation to clustered Web applications. We extend application server middleware with the ability to automatically decide the size of application clusters and their placement on physical machines. Unlike existing solutions, which focus on maximizing resource utilization and may unfairly treat some applications, the(More)
This paper presents a probabilistic event-driven fault localization technique, which uses a probabilistic symptom-fault map as a fault propagation model. The technique isolates the most probable set of faults through incremental updating of a symptom-explanation hypothesis. At any time, it provides a set of alternative hypotheses, each of which is a(More)
We introduce and evaluate a middleware clustering technology capable of allocating resources to web applications through dynamic application instance placement. We define application instance placement as the problem of placing application instances on a given set of server machines to adjust the amount of resources available to applications in response to(More)
Fault localization, a central aspect of network fault management, is a process of deducing the exact source of a failure from a set of observed failure indications. It has been a focus of research activity since the advent of modern communication systems, which produced numerous fault localization techniques. However, as communication systems evolved(More)
MapReduce is a scalable and fault tolerant framework, patented by Google, for computing embarrassingly parallel reductions. Hadoop is an open-source implementation of Google MapReduce that is made available as a web service to cloud users by the Amazon Web Services (AWS) cloud computing infrastructure. Amazon Spot Instances (SIs) provide an inexpensive yet(More)
—MapReduce is a data-driven programming model proposed by Google in 2004 which is especially well suited for distributed data analytics applications. We consider the management of MapReduce applications in an environment where multiple applications share the same physical resources. Such sharing is in line with recent trends in data center management which(More)
We present a technique that enables existing middleware to fairly manage mixed workloads: batch jobs and transactional applications. The technique leverages a generic application placement controller, which dynamically allocates compute resources to application instances. The controller works towards a fairness goal while also trying to maximize individual(More)
We present a peer-to-peer service management middleware that dynamically allocates system resources to a large set of applications. The system achieves scalability in number of nodes (1000s or more) through three decentralized mechanisms that run on different time scales. First, overlay construction interconnects all nodes in the system for exchanging(More)