Detecting large-scale system problems by mining console logs

@inproceedings{Xu2009DetectingLS,
  title={Detecting large-scale system problems by mining console logs},
  author={W. Xu and Ling Huang and A. Fox and D. Patterson and Michael I. Jordan},
  booktitle={SOSP '09},
  year={2009}
}
Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We first parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these… Expand
Real-Time Evasion Attacks against Deep Learning-Based Anomaly Detection from Distributed System Logs
TLDR
A real-time attack method called LAM (Log Anomaly Mask) is proposed to perturb streaming logs with minimal modifications in an online fashion so that the attacks can evade anomaly detection by even the state-of-the-art deep learning models. Expand
Anomaly Detection in Distributed Systems via Variational Autoencoders
TLDR
VeLog, an automatic anomaly detection method based on variational autoencoders (VAEs) that detects anomalies by automatically evaluating whether the distance between the input vector and its estimated vector matches these normal intervals. Expand
Improving Log-Based Anomaly Detection with Component-Aware Analysis
TLDR
Experimental results show that LogC overall outperforms three baselines (i.e., PCA, IM, and DeepLog) in terms of three metrics (precision, recall, and F-measure). Expand
Tools and Benchmarks for Automated Log Parsing
  • Jieming Zhu, Shilin He, +4 authors Michael R. Lyu
  • Computer Science
  • 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
  • 2019
TLDR
This paper presents a comprehensive evaluation study on automated log parsing, evaluating 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software and reports the results in terms of accuracy, robustness, and efficiency. Expand
Identifying impactful service system problems via log analysis
TLDR
This paper proposes Log3C, a novel clustering-based approach to promptly and precisely identify impactful system problems, by utilizing both log sequences (a sequence of log events) and system KPIs, which can greatly save the clustering time while keeping high accuracy. Expand
An empirical investigation of practical log anomaly detection for online service systems
  • Nengwen Zhao, Honglin Wang, +9 authors Dan Pei
  • Computer Science
  • ESEC/SIGSOFT FSE
  • 2021
Log data is an essential and valuable resource of online service systems, which records detailed information of system running status and user behavior. Log anomaly detection is vital for serviceExpand
Multi-Scale One-Class Recurrent Neural Networks for Discrete Event Sequence Anomaly Detection
TLDR
OC4Seq is proposed, a multi-scale one-class recurrent neural network for detecting anomalies in discrete event sequences that integrates the anomaly detection objective with recurrent neural networks (RNNs) to embed the discreteevent sequences into latent spaces, where anomalies can be easily detected. Expand
On the Naturalness and Localness of Software Logs
TLDR
This paper begins with the hypothesis that log files are natural and local and these attributes can be applied for automating log analysis tasks, and guides the research with six research questions with regards to the naturalness and localness of the log files. Expand
STEP: Spatial-Temporal Network Security Event Prediction
TLDR
This paper verifies the proposed STEP scheme on two public data sets and shows that the prediction accuracy of security events under STEP is higher than that of benchmark models such as LSTM, ConvLSTM. Expand
Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation
  • Lin Yang, Junjie Chen, +4 authors Wenbin Zhang
  • Computer Science
  • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)
  • 2021
TLDR
This paper proposes a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 64 REFERENCES
Profiling internet backbone traffic: behavior models and applications
TLDR
A general methodology for building comprehensive behavior profiles of Internet backbone traffic in terms of communication patterns of end-hosts and services and can identify common traffic profiles as well as anomalous behavior patterns that are of interest to network operators and security analysts is presented. Expand
Understanding Customer Problem Troubleshooting from Storage System Logs
TLDR
It is observed that customer problems with attached system logs are invariably resolved much faster than those without logs, and that combining failure messages with multiple log events can improve low-level root cause prediction by a factor of three. Expand
What Supercomputers Say: A Study of Five System Logs
  • A. Oliner, Jon Stearley
  • Computer Science
  • 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)
  • 2007
TLDR
This paper examines system logs from five supercomputers with the aim of providing useful insight and direction for future research into the use of such logs, and proposes a simpler and more effective filtering algorithm. Expand
Fingerprinting the datacenter: automated classification of performance crises
TLDR
This work proposes and evaluates a methodology for automatic classification and identification of crises, and in particular for detecting whether a given crisis has been seen before, so that a known solution may be immediately applied. Expand
Introduction to Information Retrieval
  • R. Larson
  • Computer Science
  • J. Assoc. Inf. Sci. Technol.
  • 2010
Clustering event logs using iterative partitioning
TLDR
This paper presents IPLoM (Iterative Partitioning Log Mining), a novel algorithm for the mining of clusters from event logs that outperforms the other algorithms statistically significantly, and is also able to achieve an average F- Measure performance 78% when the closest other algorithm achieves an F-Measure performance of 10%. Expand
Detecting large-scale system problems by mining console logs
  • SOSP'09 2009
  • 2009
Large-Scale System Problems Detection by Mining Console Logs
TLDR
This work first parse console logs by combining source code analysis with information retrieval to create composite features, and then analyzes these features using machine learning to detect operational problems to automatically detect system runtime problems. Expand
Online System Problem Detection by Mining Patterns of Console Logs
TLDR
A novel application of using data mining and statistical learning methods to automatically monitor and detect abnormal execution traces from console logs in an online setting and shows that it can not only achieve highly accurate and fast problem detection, but also help operators better understand execution patterns in their system. Expand
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning
TLDR
A system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours, and was able to correctly identify both the short and long-running queries to inform workload management and capacity planning. Expand
...
1
2
3
4
5
...