Evaluation and design of highly reliable and highly utilized cloud computing systems

  title={Evaluation and design of highly reliable and highly utilized cloud computing systems},
  author={Brett Snyder and Jordan Ringenberg and Robert C. Green and Vijay Kumar Devabhaktuni and Mansoor Alam},
  journal={Journal of Cloud Computing},
Cloud computing paradigm has ushered in the need to provide resources to users in a scalable, flexible, and transparent fashion much like any other utility. This has led to a need for developing evaluation techniques that can provide quantitative measures of reliability of a cloud computing system (CCS) for efficient planning and expansion. This paper presents a new, scalable algorithm based on non-sequential Monte Carlo Simulation (MCS) to evaluate large scale cloud computing system (CCS… 
Improving Failure Tolerance in Large-Scale Cloud Computing Systems
A simulation-driven framework based on real cloud computing system operation logs for improving failure tolerance in large-scale cloud computing systems is proposed and the proposed reliability-aware resource scheduling algorithm is adopted to optimize resources so that the system's reliability can be improved cost-effectively.
A simple model to exploit reliable algorithms in cloud federations
A simple model together with a methodology to couple scheduling software with GWpilot allows the personalised characterisation of cloud resources that those algorithms require, overcoming their lack of trustworthiness in the information provided by the cloud services.
ReliaCloud‐NS: A scalable web‐based simulation platform for evaluating the reliability of cloud computing systems
This paper discusses the implementation, architecture, and use of a graphical web‐based application called ReliaCloud‐NS that allows users to (1) evaluate the reliability of a cloud computing system
Assessing the Reliability of Hybrid Clouds with Monte Carlo Simulation
A reliability assessment method based on Monte Carlo Simulation in the environment of hybrid clouds which include public clouds, private clouds, and edge clouds is proposed and shown to be feasibility.
Reliability Assessment for Cloud Applications
A DEpendency-Based Reliability Assessment (DEBRA) framework is proposed and results show that DEBRA can obtain results of high quality and has several merits regarding modeling cloud applications for reliability assessment.
Reliability and high availability in cloud computing environments: a reference roadmap
Reliability and high availability have always been a major concern in distributed systems. Providing highly available and reliable services in cloud computing is essential for maintaining customer
Optimal Scheduling and Management on Correlating Reliability, Performance, and Energy Consumption for Multiagent Cloud Systems
A reliability-performance-energy correlation model is first proposed, which captures significant effects of random resource failures and recovery on MACS performance and energy consumption to ensure high fidelity and precise evaluation and an approach for optimization is proposed.
CART, a Decision SLA Model for SaaS Providers to Keep QoS Regarding Availability and Performance
An analytic model cloud availability and response time (CART) for obtaining the best tradeoff between performance, cost, and availability in a cloud system aimed at providing software as a service is presented.
Reliability Analysis of Cloud Computing Systems Serving Multi-Class Requests
The reliability analysis of CCs is performed using the ReliaCloud-NS simulation framework and the study on components of CCS, i.e., HDD, CPU, Bandwidth and memory, is performed, illustrating its failure in various VMs.
Obscured by the cloud: A resource allocation framework to model cloud outage events


Reliability and Availability of Cloud Computing
Reliability and Availability of Cloud Computing is the guide for IS/IT staff in business, government, academia, and non-governmental organizations who are moving their applications to the cloud.
Providing reliability as an elastic service in cloud computing
This paper proposes a novel method for providing reliability as an elastic and on-demand service that makes use of peer-to-peer checkpointing and allows user reliability levels to be jointly optimized based on an assessment of their individual requirements and total available resources in the data center.
Evaluation of system reliability for a cloud computing system with imperfect nodes
System reliability is developed in this paper to evaluate the capability of the CCS to send d units of data from the cloud to the client through two paths under both the maintenance budget and time constraints and an algorithm with an adjusting procedure based on the branch‐and‐bound approach is proposed to evaluated the system reliability.
Scalable Analytics for IaaS Cloud Availability
This paper presents a scalable, stochastic model-driven approach to quantify the availability of a large-scale IaaS cloud, where failures are typically dealt with through migration of physical machines among three pools: hot, warm, turned on, but not ready, and cold.
A hierarchical model to evaluate quality of experience of online services hosted by cloud computing
A hierarchical modeling approach is proposed that can easily combine all components of this environment and serves as a very useful analytical tool for online service providers to evaluate cloud computing providers and design redirection strategies.
Performance indicator evaluation for a cloud computing system from QoS viewpoint
In order to measure the service level of a CCS, this paper constructs a network model and proposes a key performance indicator (KPI), where the KPI is utilized to evaluate the probability that the demand can be satisfied under both transmission time and maintenance budget constraints.
Availability of Services in the Era of Cloud Computing
The state of availability of services in the cloud is discussed, which refers to the uptime of a system, a network of systems, hardware and software that collectively provide a service during its usage.
Estimation of Maintenance Reliability for a Cloud Computing Network
An algorithm for the cloud computing network (CCN) is proposed to evaluate the capability that the CCN can send d units of data from the cloud to the client through two paths under both the maintenance budget and time constraints.
Chapter 7 – Fault Tolerance and Resilience in Cloud Computing Environments