On Improving Deep Learning Trace Analysis with System Call Arguments

  title={On Improving Deep Learning Trace Analysis with System Call Arguments},
  author={Quentin Fournier and Daniel Aloise and Seyed Vahid Azhari and Franccois Tetreault},
  journal={2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)},
Kernel traces are sequences of low-level events comprising a name and multiple arguments, including a timestamp, a process id, and a return value, depending on the event. Their analysis helps uncover intrusions, identify bugs, and find latency causes. However, their effectiveness is hindered by omitting the event arguments. To remedy this limitation, we introduce a general approach to learning a representation of the event names along with their arguments using both embedding and encoding. The… 
4 Citations

Anomaly detection in microservice environments using distributed tracing data analysis and NLP

An NLP (natural language processing) based approach to detect performance anomalies in spans during a given trace, besides locating release-over-release regressions, and speeds up root cause analysis by means of implemented visualization tools in Trace Compass.

Visualization of profiling and tracing in CPU‐GPU programs

As the complexity of the toolchain increases for heterogeneous CPU‐GPU systems, the needs for comprehensive tracing and debugging tools also grows. Heterogeneous platforms bring new possibilities but

A Practical Survey on Faster and Lighter Transformers

This survey investigates popular approaches to make the Transformer faster and lighter and provides a comprehensive explanation of the methods' strengths, limitations, and underlying assumptions to meet the desired trade-off between capacity, computation, and memory.

Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data

This paper presents the first survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series, and provides guidelines/best practices for researchers utilizing time- based data from Git repositories.



BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

EXAD: A System for Explainable Anomaly Detection on Big Data Traces

This work presents EXAD, a new prototype system for explainable anomaly detection, in particular for detecting and explaining anomalies in time-series data obtained from traces of Apache Spark jobs.

LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems

A novel ensemble method that blends multiple thresholding classifiers into a single one, making it possible to accumulate 'highly normal' sequences to remedy the issue of high false-alarm rates commonly arising in conventional methods.

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

A Framework for Anomaly Detection in Time-Driven and Event-Driven Processes Using Kernel Traces

An end-to-end framework that comprises auto-encoders and probabilistic models to understand the behavior of system processes and detect deviant behaviors is presented and shows that by creating a fine-grained model that exploits previously unharnessed properties of the system calls, it can create a dynamic anomaly detection framework that evolves as the threats change.

Automatic Cause Detection of Performance Problems in Web Applications

This paper proposes a method of extracting the internal behavior of web requests as well as introducing a pipeline that detects performance issues in web requests and provides insights into their root causes.

Anomaly Detection from System Tracing Data Using Multimodal Deep Learning

This paper uses a bimodal distributed tracing data from large cloud infrastructures in order to detect an anomaly in the execution of system components, and proposes an anomaly detection method, which utilizes a single modality of the data with information about the trace structure.

Host Hypervisor Trace Mining for Virtual Machine Workload Characterization

This paper proposes host-level hypervisor tracing as a non-intrusive means to extract useful features, that can provide for fine grain characterization of VM behaviour, and adopts a two-stage feature selection approach in addition to a one shot clustering scheme.