Exploiting process lifetime distributions for dynamic load balancing

@article{HarcholBalter1997ExploitingPL,
  title={Exploiting process lifetime distributions for dynamic load balancing},
  author={Mor Harchol-Balter and Allen B. Downey},
  journal={ACM Trans. Comput. Syst.},
  year={1997},
  volume={15},
  pages={253-285}
}
We consider policies for CPU load balancing in networks of workstations. We address the question of whether preemptive migration (migrating active processes) is necessary, or whether remote execution (migrating processes only at the time of birth) is sufficient for load balancing. We show that resolving this issue is strongly tied to understanding the process lifetime distribution. Our measurements indicate that the distribution of lifetimes for a UNIX process is Pareto (heavy-tailed), with a… 
On the Importance of Migration for Fairness in Online Grid Markets (Short Paper)
TLDR
This paper presents fairness and quality of service properties for economic online scheduling algorithms and shows that it is impossible to achieve these properties without the use of migration.
Generalized Load Sharing for Homogeneous Networks of Distributed Environment
TLDR
This work proposes a method for job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems to reduce the number of page faults caused by unbalanced memory allocations for jobs among distributed nodes.
Effective task assignment strategies for distributed systems under highly variable workloads
TLDR
This paper presents a model for distributing heavy-tailed workload distributions, where a small number of very large tasks make up a large proportion of the workload, making the load very hard to manage.
Cluster scheduling for explicitly-speculative tasks
TLDR
This work promotes a way of working that exploits the inherent speculation in application-level search made more common by the cost-effectiveness of grid and cluster computing, and shows how batchactive schedulers reduce user-observed response times relative to conventional models.
Dynamically forecasting network performance using the Network Weather Service
  • R. Wolski
  • Computer Science
    Cluster Computing
  • 2004
TLDR
The Network Weather Service is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments and its design and predictive performance are outlined.
Incorporating job migration and network RAM to share cluster memory resources
TLDR
An improved load-sharing scheme is proposed by combining job migrations with network RAM for cluster computing that uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a larger memory space for the job than would be available otherwise.
Improving distributed workload performance by sharing both CPU and memory resources
TLDR
This work develops and examines job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems and shows that their load sharing policies not only improve performance of memory bound jobs, but also maintain the same load sharing quality as the CPU-based policies for CPU-bound jobs.
Quantifying the Performance Improvement of Migration in Load Sharing Systems
TLDR
The various measures carried out show that the additional improvement is between 15% and 30%, and rarely attains 35%, and the placement algorithm is more adaptive to the applications and environments connguration than the migration algorithm.
Periodic load balancing
TLDR
A heavy-traffic limit theorem shows that one-dimensional reflected Brownian motion can be used to approximately describe system performance, even with general arrival and service processes.
...
...

References

SHOWING 1-10 OF 46 REFERENCES
The MOSIX Distributed Operating System: Load Balancing for UNIX
TLDR
The MOSIX linker acts as a linker between the UNIX file system and the distributed systems, allowing for scalable and distributed file systems.
The MOSIX Distributed Operating System
A comparison of preemptive and non-preemptive load distributing
  • P. Krueger, M. Livny
  • Business
    [1988] Proceedings. The 8th International Conference on Distributed
  • 1988
TLDR
It is found that while placement alone is capable of large improvement in performance, the addition of migration can achieve considerable additional improvement.
Utopia: A load sharing facility for large, heterogeneous distributed computer systems
TLDR
The design and implementation issues in Utopia, a load sharing facility specifically built for large and heterogeneous systems, are discussed, which has no restriction on the types of tasks that can be remotely executed, involves few application changes and no operating system change, and incurs low overhead.
Transparent process migration: Design alternatives and the sprite implementation
TLDR
The Sprite operating system is used to offload work onto idle machines, and also to evict migrated processes when idle workstations are reclaimed by their owners, providing a high degree of transparency both for migrated processes and for users.
Mach and Load Distribution
TLDR
The choice of the Mach µkernel as an underlying environment for the LD scheme is of particular importance because it is a widely used µkernel in both research and commercial communities and supports the sophisticated enhancements that are significant for theLD design and implementation.
The probability of load balancing success in a homogeneous network
TLDR
A general formula is determined that can be used to define the likelihood of load balancing success in a distributed operating system and gives insight into the utilization of the system and is an aid in determining a measure of effectiveness of thesystem.
A note on The limited performance benefits of migrating active processes for load
TLDR
The general result of [ELZ88] that there are likely no conditions under which migration could yield major performance improvements beyond those offered by non-migratory load sharing does not apply to current systems.
Performance Studies of Dynamic Load Balancing in Distributed Systems
TLDR
Load balancing is found to reduce significantly the mean and standard deviation of job response times, especially under heavy and/or unbalanced workload, which is strongly dependent upon the load index.
...
...