Learn More
Real-time problem diagnosis in large distributed computer systems and networks is a challenging task that requires fast and accurate inferences from potentially huge data volumes. In this paper, we propose a cost-efficient, adaptive diagnostic technique called active probing . Probes are end-to-end test transactions that collect information about the(More)
We present an architecture for and prototype of a system for quickly detecting software problem recurrences. Re-discovery of the same problem is very common in many large software products and is a major cost component of product support. At run-time, when a problem occurs, the system collects the problem symptoms, including the program call-stack, and(More)
In many applications, the data, such as web pages and research papers, contain relation (link) structure among entities in addition to textual content information. Matrix factorization (MF) methods, such as latent semantic indexing (LSI), have been successfully used to map either content information or relation information into a lower-dimensional latent(More)
Recently, there has been considerable interest in computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different(More)
We describe algorithms and an architecture for a real-time problem determination system that uses online selection of most-informative measurements - the approach called herein active probing. Probes are end-to-end test transactions which gather information about system components. Active probing allows probes to be selected and sent on-demand, in response(More)
We present an architecture and algorithms for performing automated software problem determination using call-stack matching. In an environment where software is used by a large user community, the same problem may re-occur many times. We show that this can be detected by matching the program call-stack against a historical database of call-stacks, so that(More)
As distributed systems continue to grow in size and complexity, scalable and cost-efficient techniques are needed for performing tasks such as problem determination and fault diagnosis. In this paper, we address these tasks using probes, or test transactions, which replace traditional “passive” event-correlation techniques with a more active, real-time(More)
This paper studies the accuracy/efficiency trade-off in probabilistic diagnosis formulated as finding the <i>most-likely explanation (MPE)</i> in a Bayesian network. Our work is motivated by a practical problem of efficient real-time fault diagnosis in computer networks using test transactions, or <i>probes</i>, sent through(More)