Learn More
Large scale distributed systems like Grids are difficult to study only from theoretical models and simulators. Most Grids deployed at large scale are production platforms that are inappropriate research tools because of their limited re-configuration, control and monitoring capabilities. In this paper, we present Grid'5000, a 5000 CPUs nationwide(More)
Large scale distributed systems like Grid gather several characteristics making them difficult to study only from theoretical models and simulators. Most of Grid deployed at large scale are production platforms making them inappropriate research tools because of their limited reconfig-uration, control and monitoring capabilities. In this paper , we present(More)
We tackle the problem of scheduling task graphs onto a heterogeneous set of machines, where each processor has a probability of failure governed by an exponential law. The goal is to design algorithms that optimize both makespan and reliability. First, we provide an optimal scheduling algorithm for independent unitary tasks where the objective is to(More)
Scheduling stochastic workloads is a difficult task. In order to design efficient scheduling algorithms for such workloads, it is required to have a good in-depth knowledge of basic random scheduling strategies. This paper analyzes the distribution of sequential jobs and the system behavior in heterogeneous computational grid environments where the(More)
MPI process placement can play a deterministic role concerning the application performance. This is especially true with nowadays architecture (heterogenous, multicore with different level of caches, etc.). In this paper, we will describe a novel algorithm called TreeMatch that maps processes to resources in order to reduce the communication cost of the(More)
Current generations of NUMA node clusters feature multicore or manycore processors. Programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs(More)
The GridRPC model [17] is an emerging standard promoted by the Global Grid Forum (GGF) † that defines how to perform remote client-server computations on a distributed architecture. In this model data are sent back to the client at the end of every computation. This implies unnecessary communications when computed data are needed by an other server in(More)
The paper addresses the problem of matching and scheduling of DAG-structured application to both minimize the makespan and maximize the robustness in a heterogeneous computing system. Due to the conflict of the two objectives, it is usually impossible to achieve both goals at the same time. We give two definitions of robustness of a schedule based on(More)