On optimal monitor placement for localizing node failures via network tomography


We investigate the problem of placing monitors to localize node failures in a communication network from binary states (normal/failed) of end-to-end paths, under the assumption that a path is in normal state if and only if it contains no failed nodes. To uniquely localize failed nodes, the measurement paths must show different symptoms (path states) under different failure events. Our goal is to deploy the minimum set of monitors to satisfy this condition for a given probing mechanism. We consider three families of probing mechanisms, according to whether measurement paths are (i) arbitrarily controllable, (ii) controllable but cycle-free, or (iii) uncontrollable (i.e., determined by the default routing protocol). We first establish theoretical conditions that characterize network-wide failure identifiability through a per-node identifiability measure that can be efficiently evaluated for the above three probing mechanisms. Leveraging these results, we develop a generic monitor placement algorithm, applicable under any probing mechanism, that incrementally selects monitors to optimize the per-node measure. The proposed algorithm is shown to be optimal for probing mechanism (i), and provides upper and lower bounds on the minimum number of monitors required by the other probing mechanisms. In the special case of single-node failures, we develop an improved monitor placement algorithm that is optimal for probing mechanism (ii) and has linear time complexity. Using these algorithms, we study the impact of the probing mechanism on the number of monitors required for uniquely localizing node failures. Our results based on real network topologies show that although more complicated to implement, probing mechanisms that allow monitors to control measurement paths substantially reduce the required number of monitors.

DOI: 10.1016/j.peva.2015.06.003

10 Figures and Tables

Cite this paper

@article{Ma2015OnOM, title={On optimal monitor placement for localizing node failures via network tomography}, author={Liang Ma and Ting He and Ananthram Swami and Donald F. Towsley and Kin K. Leung}, journal={Perform. Eval.}, year={2015}, volume={91}, pages={16-37} }