Learn More
—High response quality is critical for many best-effort interactive services, and at the same time, reducing energy consumption can directly reduce the operational cost of service providers. In this paper, we study the quality-energy tradeoff for such services by using a composite performance metric that captures their relative importance in practice:(More)
—Heterogeneous servers are becoming prevalent in many high-performance computing environments, including clusters and datacenters. In this paper, we consider multi-objective scheduling for heterogeneous server systems to optimize simultaneously the application performance, energy consumption and thermal imbalance. First, a greedy online framework is(More)
—To service requests with high quality, interactive services such as web search, on-demand video and online gaming keep average server utilization low. As servers become busy, queuing delays increase, and requests miss their deadlines, resulting in degraded quality of service with poor user experience and potential revenue loss. In this paper, we propose(More)
As multi-core processors proliferate, it has become more important than ever to ensure efficient execution of parallel jobs on multi-processor systems. In this paper, we study the problem of scheduling parallel jobs with arbitrary release time on multiprocessors while minimizing the jobs' mean response time. We focus on non-clairvoyant scheduling schemes(More)
In this article, we combine the traditional checkpointing and rollback recovery strategies with verification mechanisms to cope with both fail-stop and silent errors. The objective is to minimize makespan and/or energy consumption. For divisible load applications, we use first-order approximations to find the optimal checkpointing period to minimize(More)
Energy consumption and heat dissipation have become key considerations for modern high performance computer systems. In this paper, we focus on non-clairvoyant speed scaling to minimize flow time plus energy for batched parallel jobs on multiprocessors. We consider a common scenario where the total power consumption cannot exceed a given budget and the(More)
—This work focuses on resilience techniques at extreme scale. Many papers deal with fail-stop errors. Many others deal with silent errors (or silent data corruptions). But very few papers deal with fail-stop and silent errors simultaneously. However, HPC applications will obviously have to cope with both error sources. This paper presents a unified(More)