Disruption-Aware Dynamic Component Deployment and Composition in Ultra-Large-Scale (ULS) Systems

Abstract

ULS systems [1] are characterized by hundreds of thousands of hardware platforms and software systems connected through hierarchies of heterogeneous wireline and wireless networks. The development and maintenance of ULS systems is extremely hard due to the decentralization, dynamics, and heterogeneity of the computation and communication infrastructures that support these systems. The criticality of ULS systems requires high assurance and resilience against a wide spectrum of disturbances, including failures of system parts, as well as physical and cyber attacks. Component-based software development focuses on building large software systems by integrating previouslyexisting software components [2], [3]. At the foundation of this approach is the assumption that certain parts of software systems reappear with sufficient regularity that common parts (i.e., the components) can be reused as the basis for assembling a ULS system. In theory, the flexibility and maintainability of component-based software can help reduce software development costs, enable fast system assembling, and reduce the maintenance burden for ULS systems. In practice, however, composing a ULS system from reusable components is problematic due to the following unresolved research challenges: 1) The highly dynamic and unpredictable behavior of ULS systems prevents the application of static reliability analysis. Existing research [4] on reliable component deployment assumes a static network setting where network topology, node and link reliability are fixed and known a priori. Since these assumptions are unrealistic for ULS systems, new reliability and availability analytical frameworks are needed to capture the traditional concepts of instantaneous robustness and the time-sequenced concept of robustness that arise in dynamic ULS systems. 2) ULS systems require decentralized component deployment and recovery algorithms that can scale up to hundreds of thousands of hardware platforms and software components. Existing algorithms either are based on centralized assumptions or require precise and/or global system information [5], [6], [7] to make decisions, which limit the scalability of these algorithms. New algorithms that operate on partial, incomplete and imprecise information are therefore needed to guide component deployment and recovery decisions in ULS systems. 3) ULS systems involve heterogeneous applications and users with different subjective needs wrt system quality of service (QoS), such as reliability and availability. Existing research primarily focuses on low-level system reliability metrics, such as normalized reliability of composed service graph [4], and neglects higher-level user-perceived QoS. New component deployment and recovery strategies are therefore needed to support these different groups of users and reflect subjective human elements.

Cite this paper

@inproceedings{Xue2006DisruptionAwareDC, title={Disruption-Aware Dynamic Component Deployment and Composition in Ultra-Large-Scale (ULS) Systems}, author={Yuan Xue and Shanshan Jiang}, year={2006} }