Scheduling bags of tasks and gangs in a distributed system
In this paper we study the performance of a distributed system which is subject to hardware failures and subsequent repairs. A special type of scheduling called gang scheduling is considered, under which jobs consist of a number of interacting tasks which are scheduled to run simultaneously on distinct processors. System performance is examined and compared in cases where different distributions for the number of parallel tasks per job (gang size) are employed. We examine cases where gang size is defined by a specific distribution and also a case where gang size distribution varies with time.