An Autonomic Approach to Integrated HPC Grid and Cloud Usage

@article{Kim2009AnAA,
  title={An Autonomic Approach to Integrated HPC Grid and Cloud Usage},
  author={Hyunjoo Kim and Yaakoub El Khamra and Shantenu Jha and M. Parashar},
  journal={2009 Fifth IEEE International Conference on e-Science},
  year={2009},
  pages={366-373}
}
Clouds are rapidly joining high-performance Grids as viable computational platforms for scientific exploration and discovery, and it is clear that production computational infrastructures will integrate both these paradigms in the near future. As a result, understanding usage modes that are meaningful in such a hybrid infrastructure is critical. For example, there are interesting application workflows that can benefit from such hybrid usage modes to, per- haps, reduce times to solutions, reduce… 

Figures and Tables from this paper

Exploring application and infrastructure adaptation on hybrid grid-cloud infrastructure

TLDR
This paper uses the ensemble Kalman-filter based dynamic application workflow to investigate how clouds can be effectively used as an accelerator to address changing computational requirements as well as changing Quality of Service constraints (e.g., deadlines).

Autonomic management of application workflows on hybrid computing infrastructure

TLDR
A programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures and shows how different applications objectives can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources.

CHASE: An Autonomic Service Engine for Cloud Environments

TLDR
The design and development of CHASE is presented, an autonomic engine designed to optimize the scheduling of virtual machines in a cloud environment and its application in two different contexts: in PerfCloud, an environment for IaaS provision based on cloud and grid integration, and inside Cloud@Home, a project whose objective is to build a cloud using volunteer-based resources.

Combining Grid and Cloud Resources for Hybrid Scientific Computing Executions

TLDR
Different approaches to integrate the usage of Grid and Cloud-based resources for the execution of High Throughput Computing scientific applications are described and a prototype implementation is described.

Combining Grid and Cloud Resources by Use of Middleware for SPMD Applications

TLDR
This paper advocates the usage of Proactive, a well established middleware in the grid community, for mixed Grid/Cloud computing, extended with features to address Grid/ Cloud issues with little or, no effort for application developers.

Enhancing an Autonomic Cloud Architecture with Mobile Agents

TLDR
This paper presents a monitoring system to support autonomicity based on the mobile agents computing paradigm and proposes a framework based on an autonomic engine, designed to optimize resource management in clouds, grids or hybrid cloud-grid environments.

A component‐based middleware for hybrid grid/cloud computing platforms

TLDR
A generic, adaptable, and extensible component‐based middleware that seamlessly enables a transition of non‐trivial applications from traditional grids to hybrid grid–cloud platforms is presented.

Integrating multiple clusters for compute-intensive applications

TLDR
A novel model called DA-TC (Dynamic Assignment with Task Containers) is developed and is integrated into Pelecanus to bridge the gap between user's needs and the system’s heterogeneity and shows that the model could significantly reduce turnaround time and increase resource utilization for targeted application scenarios.

Multi-domain grid/cloud computing through a hierarchical component-based middleware

TLDR
A generic, adaptable and extensible component-based middleware that seamlessly enables a transition of non-trivial applications from traditional Grids to hybrid Grid-Cloud platforms and offers mechanisms to exploit the hierarchical, heterogeneous and dynamic nature of platforms.

Scheduling Concurrent Workflows in HPC Cloud through Exploiting Schedule Gaps

TLDR
This paper proposes a method which exploits schedule gaps to efficiently schedule concurrent workflows in HPC cloud and can deliver good performance and outperform the existing method significantly in terms of average makespan, up to 18% performance improvement.
...

References

SHOWING 1-10 OF 20 REFERENCES

Online Risk Analytics on the Cloud

TLDR
This paper demonstrates how the CometCloud autonomic computing engine can support online multi-resolution VaR analytics using and integration of private and Internet cloud resources.

Investigating the use of autonomic cloudbursts for high-throughput medical image registration

TLDR
A virtual computational cloud that integrates local computational environments and public cloud services on-the-fly, and support image registration requests from different distributed researcher groups with varied computational requirements and QoS constraints is enabled.

Investigating the Use of Cloudbursts for High-Throughput Medical Image Registration.

  • Hyunjoo KimM. ParasharD. ForanL. Yang
  • Computer Science
    Proceedings of the ... IEEE/ACM International Conference on Grid Computing. IEEE/ACM International Conference on Grid Computing
  • 2009
TLDR
A virtual computational cloud that integrates local computational environments and public cloud services on-the-fly, and support image registration requests from different distributed researcher groups with varied computational requirements and QoS constraints is enabled.

Combining batch execution and leasing using virtual machines

TLDR
A scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times, is described, and a VM-based approach can provide better performance than a scheduler that does not support task pre-emption.

Predicting bounds on queuing delay for batch-scheduled parallel machines

TLDR
A new method for providing end-users with predictions for the bounds on the queuing delay individual jobs will experience is explored, and it is shown that it is possible to predict delay bounds reliably for jobs in different queues, and for jobs requesting different ranges of processor counts.

Reservoir model updating by Ensemble Kalman Filter - Practical approaches using grid computing technology

TLDR
A synthetic case indicates that ResGrid efficiently performs EnKF inversions to obtain accurate, uncertainty-aware predictions of reservoir production, and the ResGrid EnkF is open-source and available for downloading.

Distributed computing in practice: the Condor experience

TLDR
The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.

Chord: A scalable peer-to-peer lookup service for internet applications

TLDR
Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

Chord: a scalable peer-to-peer lookup protocol for internet applications

TLDR
Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.

The Ensemble Kalman Filter for Continuous Updating of Reservoir Simulation Models

TLDR
The use of ensemble Kalman filter (EnKF) for automatic history matching can provide satisfactory history matching results while requiring less computation work than traditional history matching methods.