Capturing, indexing, clustering, and retrieving system history
@inproceedings{Cohen2005CapturingIC, title={Capturing, indexing, clustering, and retrieving system history}, author={Ira Cohen and Steve Zhang and Mois{\'e}s Goldszmidt and Julie Symons and Terence Kelly and Armando Fox}, booktitle={Symposium on Operating Systems Principles}, year={2005} }
We present a method for automatically extracting from a running system an indexable signature that distills the essential characteristic from a system state and that can be subjected to automated clustering and similarity-based retrieval to identify when an observed system state is similar to a previously-observed state. This allows operators to identify and quantify the frequency of recurrent problems, to leverage previous diagnostic efforts, and to establish whether problems seen at different…
Figures and Tables from this paper
339 Citations
Capturing, indexing, and retrieving system history
- Computer Science
- 2007
The approach of automatically extracting indexable descriptions, or signatures, that distill the system information most associated with a problem and can be formally manipulated to facilitate automated clustering and similarity based search is described.
Fmeter: Extracting Indexable Low-Level System Signatures by Counting Kernel Function Calls
- Computer ScienceMiddleware
- 2012
A monitoring system that extracts formal, indexable, low-level system signatures using the classical vector space model from the field of information retrieval and text mining and shows that the signatures are naturally amenable to formal processing with statistical methods like clustering and supervised machine learning.
Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems
- Computer ScienceNSDI
- 2012
DISTALYZER is described, an automated tool to support developer investigation of performance issues in distributed systems that uses machine learning techniques to compare system behaviors extracted from the logs and automatically infer the strongest associations between system components and performance.
Learning, indexing, and diagnosing network faults
- Computer ScienceKDD
- 2009
This work introduces a new class of indexable fault signatures that encode temporal evolution of events generated by a network fault as well as topological relationships among the nodes where these events occur and presents an efficient learning algorithm to extract such fault signatures from noisy historical event data.
Identifying Recurrent and Unknown Performance Issues
- Computer Science2014 IEEE International Conference on Data Mining
- 2014
This paper formulate the problem of issue identification as a HMRF-based clustering problem and incorporates the learning of metric discretization thresholds and the optimization of issue clustering, which can achieve accurate identification of recurrent issues and unknown issues.
Use of Incremental Clustering in Clustering of Message Types in a Server Log
- Computer Science
- 2013
It is being proposed to apply the incremental clustering to extract the data from the event log as per the characteristics provided by the users of the system.
Supporting System-wide Similarity Queries for networked system management
- Computer Science2010 IEEE Network Operations and Management Symposium - NOMS 2010
- 2010
S2Q simplifies many systems management tasks through a simple and intuitive query interface available to operators, and two applications are evaluated in the paper: fast diagnosis of repeated failures in enterprise IT systems, and automated application traffic profiling on computer networks.
System Problem Detection by Mining Console Logs
- Computer Science
- 2010
This work proposes a fully automatic methodology for mining console logs using a combination of program analysis, information retrieval, data mining, and machine learning techniques, and extends the methods to perform online analysis on console log streams.
Log4Perf: suggesting and updating logging locations for web-based systems’ performance monitoring
- Computer ScienceEmpirical Software Engineering
- 2019
Log4Perf is an automated approach that provides suggestions of where to insert logging statement with the goal of monitoring web-based systems’ CPU usage and can build well-fit statistical performance models, indicating that such models can be leveraged to investigate the influence of locations in the source code on performance.
System State Discovery Via Information Content Clustering of System Logs
- Computer Science2011 Sixth International Conference on Availability, Reliability and Security
- 2011
This work explores a natural behaviour of system logs where system log data partitioned using source and time information contain correlated message types and demonstrates how the groups of partitions can be found by clustering the partitions based on their entropy-based information content.
References
SHOWING 1-10 OF 26 REFERENCES
Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control
- Computer ScienceOSDI
- 2004
Experimental results from a testbed show that TAN models involving small subsets of metrics capture patterns of performance behavior in a way that is accurate and yields insights into the causes of observed performance effects.
Ensembles of models for automated diagnosis of system performance problems
- Computer Science2005 International Conference on Dependable Systems and Networks (DSN'05)
- 2005
This paper shows that the ensemble of models captures the performance behavior of the system accurately under changing workloads and conditions, with results comparable to those produced by an oracle that continuously changes the model based on advance knowledge of the workload.
Pinpoint: problem determination in large, dynamic Internet services
- Computer ScienceProceedings International Conference on Dependable Systems and Networks
- 2002
This work presents a dynamic analysis methodology that automates problem determination in these environments by coarse-grained tagging of numerous real client requests as they travel through the system and using data mining techniques to correlate the believed failures and successes of these requests to determine which components are most likely to be at fault.
The Art of Computer Systems Performance Analysis
- Computer ScienceInt. CMG Conference
- 1990
The authors' goal is always to offer you an assortment of cost-free ebooks too as aid resolve your troubles.
Performance debugging for distributed systems of black boxes
- Computer ScienceSOSP '03
- 2003
The goal is to design tools that enable modestly-skilled programmers to isolate performance bottlenecks in distributed systems composed of black-box nodes by developing two very different algorithms for inferring the dominant causal paths through a distributed system from these traces.
Detecting application-level failures in component-based Internet services
- Computer ScienceIEEE Transactions on Neural Networks
- 2005
Pinpoint, a methodology for automating fault detection in Internet services by observing low-level internal structural behaviors of the service and modeling the majority behavior of the system as correct; and detecting anomalies in these behaviors as possible symptoms of failures.
A survey of fault localization techniques in computer networks
- EngineeringSci. Comput. Program.
- 2004
A Microrebootable System - Design, Implementation, and Evaluation
- Computer ScienceArXiv
- 2004
This work uses separation of process recovery from data recovery to enable frequent use of the microreboot, a fine grain recovery mechanism that restarts only suspected faulty application components without disturbing the rest.
httperf—a tool for measuring web server performance
- Computer Science, BusinessPERV
- 1998
This paper describes httperf, a tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of…
Data mining: practical machine learning tools and techniques with Java implementations
- Computer ScienceSGMD
- 2002
This presentation discusses the design and implementation of machine learning algorithms in Java, as well as some of the techniques used to develop and implement these algorithms.