SK-Tree: a systematic malware detection algorithm on streaming trees via the signature kernel

  title={SK-Tree: a systematic malware detection algorithm on streaming trees via the signature kernel},
  author={Thomas Cochrane and Peter Foster and Varun Chhabra and Maud Lemercier and Cristopher Salvi and Terry Lyons},
  journal={2021 IEEE International Conference on Cyber Security and Resilience (CSR)},
The development of machine learning algorithms in the cyber security domain has been impeded by the complex, hierarchical, sequential and multimodal nature of the data involved. In this paper we introduce the notion of a streaming tree as a generic data structure encompassing a large portion of real-world cyber security data. Starting from host-based event logs we represent computer processes as streaming trees that evolve in continuous time. Leveraging the properties of the signature kernel, a… 

Figures from this paper

ANUBIS: a provenance graph-based framework for advanced persistent threat detection
The high predictive performance with explainable attack story reconstruction makes ANUBIS an effective tool to use for enterprise cyber defense.
The Signature Kernel Is the Solution of a Goursat PDE
It is shown that for continuously differentiable paths, the signature kernel solves a hyperbolic PDE and recognize the connection with a well known class of differential equations known in the literature as Goursat problems.
Signature asymptotics, empirical processes, and optimal transport
This article lays down the theoretical foundations for a connection between signature asymptotics, the theory of empirical processes, and Wasserstein distances, opening up the landscape and toolkit of the second and third in the study of the first.
Capturing Graphs with Hypo-Elliptic Diffusions
This work develops a novel tensor-valued graph operator, which it calls the hypo-elliptic graph Laplacian, which provides theoretical guarantees and efficient low-rank approximation algorithms.


Detecting malware using process tree and process activity data
An anomaly detection method based on the combined data from process activities and process trees that could detect processes from two of the three malware samples used.
Malware analysis with graph kernels and support vector machines
This paper describes a modeling framework capable of representing relationships among processes belonging to the same session in an integrated way, as well as the information related to the underlying system calls executed, for analyzing the behavior of executed applications and sessions.
Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams
This work presents an online unsupervised deep learning approach to detect anomalous network activity from system logs in real time, and shows the approach's potential to greatly reduce analyst workloads.
Malware Family Fingerprinting Through Behavioral Analysis
  • Aaron Walker, S. Sengupta
  • Computer Science
    2020 IEEE International Conference on Intelligence and Security Informatics (ISI)
  • 2020
A method of classifying malware by family type through behavioral analysis, where the frequency of system function calls is used to fingerprint the actions of specific malware families, which allows this paper to demonstrate a machine learning classifier which is capable of distinguishing malware byFamily affiliation with high accuracy.
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection
Recurrent neural network language models augmented with attention for anomaly detection in system logs are presented, creating opportunities for model introspection and analysis without sacrificing state-of-the art performance.
Multi-Dimensional Anomalous Entity Detection via Poisson Tensor Factorization
This paper establishes a new benchmark for red team event detection on the Los Alamos National Laboratory Unified Host and Network Dataset by applying a tensor factorization model that exploits the multi-dimensional and sparse structure of user authentication logs.
Adaptive Anomaly Detection on Network Data Streams
This work builds upon a previously discovered persistent structure within the Los Alamos National Laboratory network data sources, to develop a regression based streaming anomaly detection mechanism that can adapt to the network behaviour over time.
Comprehensive, Multi-Source Cyber-Security Events Data Set
This data set represents 58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network, and presents 1,648,275,307 events.
Computing the full signature kernel as the solution of a Goursat problem
It is shown that the full (i.e. untruncated) signature kernel is the solution of a Goursat problem which can be efficiently computed by finite different schemes and a density argument is used to extend the previous analysis to the space of geometric rough paths.
Advanced threat intelligence: detection and classification of anomalous behavior in system processes
A system capable of explaining anomalous behavior within network-enabled user sessions by describing and interpreting kernel event anomalies detected by their deviation from normal behavior is presented.