Shrinker: Improving Live Migration of Virtual Clusters over WANs with Distributed Data Deduplication and Content-Based Addressing
Cluster platforms, deployed in data centers worldwide, are the backbone of the popular cloud computing services. For scalability, manageability and resource utilization, one physical machine in the cloud platform can be virtualized into a bundle of virtual machines (VMs). Each VM works as an independent node. To solve a time consuming task, several VMs are grouped as a virtual cluster and collaborate on the task. It’s essential for a cloud computing platform to support live migration of virtual clusters. First, virtual cluster migration enables load-management in data centers [1, 5, 3]. Second, live migration of virtual cluster provides transparent infrastructure maintenance . Third, virtual cluster migration can be used to support enterprise IT consolidation . Finally, flexible deployment of virtual clusters across data centers is also a key enabler of federated clouds . Co-migration of a group of VMs and migration of virtual clusters have attracted considerable interest for data center management [1, 5, 3] and HPC cluster computing . VCT  focuses on devising mechanisms to manage a tightly coupled HPC virtual cluster as a single entity, making cluster level operation such as suspending, migrating or resuming a synchronous process across all nodes. However, VCT requires the cluster to be offline for as long as tens of minutes. Many time sensitive applications and services cannot afford this extended downtime. VMFlock, CloudNet and Shrinker [1, 5, 3] employ the same technique, data deduplication, to reduce the network traffic during migration. Besides eliminating redundant data, VMFlock also accelerates instantiation of the applications at the target data center through transferring the essential set of data blocks first. CloudNet employs dynamic VPN connectivity to migrate networks and “smart stop and copy”to intelligently pick when to halt the iterative transfer of dirty pages to decrease downtime and latency. VMFlock, CloudNet and Shrinker have succeeded in eliminating the migration of redundant data blocks so as to reduce network traffic and total migration time. However, VMFlock adapts an offline migration mode. CloudNet and Shrinker, though they support live migration, both fail to consider that VMs in a cluster still need to collaborate on tasks during live migration.