Logan: A Distributed Online Log Parser

@article{Agrawal2019LoganAD,
  title={Logan: A Distributed Online Log Parser},
  author={Amey Agrawal and Rohit Karlupia and Rajat Gupta},
  journal={2019 IEEE 35th International Conference on Data Engineering (ICDE)},
  year={2019},
  pages={1946-1951}
}
Logs serve as a critical tool for debugging and monitoring applications. [] Key Method We implement a distributed online algorithm to accommodate for the large volume of data. We also devise a new metric for evaluation of parsers when labeled data is unavailable. We show that our method generalizes over diverse datasets without any parameter tuning or domain-specific inputs from the user. When evaluated on publicly available HDFS dataset our method performs 13x faster than the previous state-of-the-art.

Figures and Tables from this paper

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms
TLDR
A privacy preserving framework which can be employed by Platform as a Service (PaaS) providers to utilize the user logs generated on the platform while protecting the potentially sensitive logged data and a distributed log parsing algorithm which leverages Locality Sensitive Hashing.
Delog: A High-Performance Privacy Preserving Log Filtering Framework
TLDR
A privacy-preserving framework that can be employed by Platform as a Service (PaaS) providers to utilize the user logs generated on the platform while protecting the potentially sensitive logged data is described.
Prefix-Graph: A Versatile Log Parsing Approach Merging Prefix Tree with Probabilistic Graph
TLDR
Prefix-Graph is a probabilistic graph structure extended from prefix tree that represents log templates as the combination of cut-edges in root-to-leaf paths of the graph and can be easily applied to different log datasets without any additional manual work.
MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures
  • Arthur Vervaet
  • Computer Science
    2021 IEEE 37th International Conference on Data Engineering (ICDE)
  • 2021
TLDR
MoniLog is a distributed approach to detect real-time anomalies within large-scale environments that aims to detect sequential and quantitative anomalies within a multi-source log stream.
Log-based software monitoring: a systematic mapping study
TLDR
This analysis shows that logging is challenging not only in open-source projects but also in industry, and machine learning is a promising approach to enable a contextual analysis of source code for log recommendation but further investigation is required to assess the usability of those tools in practice.
Contemporary software monitoring: a systematic mapping study.
TLDR
This analysis shows that logging is challenge not only in open source projects but also in industry, and machine learning is a promising approach to enable contextual analysis of source code for log recommendation but further investigation is required to assess the usability of those tools in practice.
Survey on Log Clustering Approaches

References

SHOWING 1-10 OF 14 REFERENCES
Towards Automated Log Parsing for Large-Scale Log Data Analysis
TLDR
A parallel log parser (namely POP) on top of Spark, a large-scale data processing platform is designed and implemented to address the effectiveness of existing log parsers and their limitations when applying them into practice.
Spell: Streaming Parsing of System Event Logs
  • Min Du, Feifei Li
  • Computer Science
    2016 IEEE 16th International Conference on Data Mining (ICDM)
  • 2016
TLDR
It is shown how to dynamically extract log patterns from incoming logs and how to maintain a set of discovered message types in streaming fashion and Evaluation results demonstrate that even compared with the offline alternatives, Spell shows its superiority in terms of both efficiency and effectiveness.
A Directed Acyclic Graph Approach to Online Log Parsing
TLDR
An online log parsing method, namely Drain, based on directed acyclic graph, which encodes specially designed rules for parsing, which has the highest accuracy on all 11 datasets and frees developers from the burden of parameter tuning by allowing them use Drain with no pre-defined parameters.
Detecting large-scale system problems by mining console logs
TLDR
This work first parse console logs by combining source code analysis with information retrieval to create composite features, and then analyzes these features using machine learning to detect operational problems to automatically detect system runtime problems.
Incremental Mining of System Log Format
  • M. Mizutani
  • Computer Science
    2013 IEEE International Conference on Services Computing
  • 2013
TLDR
A new method for mining log formats and retrieving log types and parameters in incremental log messages in realtime is devised and shows that it can identify the formats of real system logs without prior knowledge.
Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
TLDR
This paper proposes an unstructured log analysis technique for anomalies detection and proposes a novel algorithm to convert free form text messages in log files to log keys without heavily relying on application specific knowledge.
What Supercomputers Say: A Study of Five System Logs
  • A. Oliner, Jon Stearley
  • Computer Science
    37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)
  • 2007
TLDR
This paper examines system logs from five supercomputers with the aim of providing useful insight and direction for future research into the use of such logs, and proposes a simpler and more effective filtering algorithm.
Apache Hadoop YARN: yet another resource negotiator
TLDR
The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
A Lightweight Algorithm for Message Type Extraction in System Application Logs
TLDR
A novel algorithm for carrying out message type extraction from event log files, IPLoM, which stands for Iterative Partitioning Log Mining, works through a 4-step process and outperforms similar algorithms statistically significantly.
A data clustering algorithm for mining patterns from event logs
  • R. Vaarandi
  • Computer Science
    Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764)
  • 2003
TLDR
A novel clustering algorithm for log file data sets is presented which helps one to detect frequent patterns from log files, to build log file profiles, and to identify anomalous log file lines.
...
...