Learn More
—We present a statistical approach to distributed detection of local latency shifts in networked systems. For this purpose, response delay measurements are performed between neighbouring nodes via probing. The expected probe response delay on each connection is statistically modelled via parameter estimation. Adaptation to drifting delays is accounted for(More)
While real-time service assurance is critical for emerging telecom cloud services, understanding and predicting performance metrics for such services is hard. In this paper, we pursue an approach based upon statistical learning whereby the behavior of the target system is learned from observations. We use methods that learn from device statistics and(More)
We present a distributed adaptive fault-handling algorithm applied in networked systems. The probabilistic approach that we use makes the proposed method capable of adaptively detect and localize network faults by the use of simple end-to-end test transactions. Our method operates in a fully distributed manner, such that each network element detects faults(More)
Sequences that can be assumed to have been generated by a number of Markov models, whose outputs are randomly interleaved but where the actual sources are hidden, occur in a number of practical situations where data is captured as an unlabeled stream of events. We present a practical method for estimating model parameters on large data sets under the(More)
—We present a statistical probing-approach to distributed fault-detection in networked systems, based on autonomous configuration of algorithm parameters. Statistical modelling is used for detection and localisation of network faults. A detected fault is isolated to a node or link by collaborative fault-localisation. From local measurements obtained through(More)
Predicting the performance of cloud services is intrinsically hard. In this work, we pursue an approach based upon statistical learning, whereby the behaviour of a system is learned from observations. Specifically, our testbed implementation collects device statistics from a server cluster and uses a regression method that accurately predicts, in real-time,(More)