Transparent Web Service Auditing via Network Provenance Functions

@article{Bates2017TransparentWS,
  title={Transparent Web Service Auditing via Network Provenance Functions},
  author={Adam Bates and Wajih Ul Hassan and Kevin R. B. Butler and Alin Dobra and Bradley Reaves and Patrick T. Cable and Thomas Moyer and Nabil Schear},
  journal={Proceedings of the 26th International Conference on World Wide Web},
  year={2017}
}
Detecting and explaining the nature of attacks in distributed web services is often difficult -- determining the nature of suspicious activity requires following the trail of an attacker through a chain of heterogeneous software components including load balancers, proxies, worker nodes, and storage services. Unfortunately, existing forensic solutions cannot provide the necessary context to link events across complex workflows, particularly in instances where application layer semantics (e.g… 
OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-layer Log Analysis
TLDR
OmegaLog is presented, a provenance tracker that bridges the semantic gap between system and application logging contexts and generates concise provenance graphs with rich semantic information relative to the state-of-the-art, with an average runtime overhead of 4%.
Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs
TLDR
It is shown that Winnower dramatically reduces storage and network overhead associated with aggregating system audit logs, by as much as 98%, without sacrificing the important information needed for attack investigation, and represents a significant step forward for security monitoring in distributed systems.
Validating the Integrity of Audit Logs Against Execution Repartitioning Attacks
TLDR
This work proposes a new design for execution unit partitioning that leverages additional runtime data to yield verified partitions that resist manipulation, and implements a prototype of the design for Linux, MARSARA, and extensively evaluates it on 14 real-world programs, targeted with expertly crafted exploits.
Pagoda: A Hybrid Approach to Enable Efficient Real-Time Provenance Based Intrusion Detection in Big Data Environments
TLDR
Pagoda is proposed, a hybrid approach that takes into account the anomaly degree of both a single provenance path and the whole provenance graph and can identify intrusion quickly if a serious compromise has been found on one path, and can further improve the detection rate by considering the behavior representation in the wholeprovenance graph.
JSgraph: Enabling Reconstruction of Web Attacks via Efficient Tracking of Live In-Browser JavaScript Executions
TLDR
JSgraph’s main goal is to enable a detailed, post-mortem reconstruction of ephemeral JS-based web attacks experienced by real network users, to enable the reconstruction of social engineering attacks that result in the download of malicious executable files or browser extensions.
Tactical Provenance Analysis for Endpoint Detection and Response Systems
TLDR
An effort to bring the benefits of data provenance to commercial EDR tools by introducing the notion of Tactical Provenance Graphs (TPGs) that, rather than encoding low-level system event dependencies, reason about causal dependencies between EDR-generated threat alerts.
SEAL: Storage-efficient Causality Analysis on Enterprise Logs with Query-friendly Compression
TLDR
Based on information-theoretic observations on system event data, the approach achieves lossless compression and supports near real-time retrieval of historic events and returns exactly the same query results as the uncompressed data.
Visualizing Web Application Execution Logs to Improve Software Security Defect Localization
TLDR
This work explores the web application analysis task through log file fusion, distillation, and visualization, which consists of visualizing the logs of web and database traffic with detailed function execution traces to establish causal links between events and their associated behaviors.
Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization
  • Hui Miao, A. Deshpande
  • Computer Science
    2019 IEEE 35th International Conference on Data Engineering (ICDE)
  • 2019
TLDR
Two high-level graph query operators are proposed to address the verboseness and evolving nature of such provenance graphs, and the optimal summary problem is PSPACE-complete and effective approximation algorithms are developed.
Versioned Project Artifacts Provenance Model Storage Property Graph Store ( PROV model ) ( Neo 4 j ) Query Execution Engine Segmentation Operator Summarization Operator Cypher Engine ( Neo 4 j ) Query Facilities Frontend Introspection Monitoring Cypher Versioned
TLDR
Two high-level graph query operators are proposed to address the verboseness and evolving nature of such provenance graphs and the optimal summary problem is PSPACE-complete and develop effective approximation algorithms.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Towards secure provenance-based access control in cloud environments
TLDR
An architecture for secure and distributed management of provenance, enabling its use in security-critical applications and developing a provenance-based access control mechanism for Cumulus cloud storage, capable of processing thousands of operations per second on a single deployment.
SPADE: Support for Provenance Auditing in Distributed Environments
TLDR
The system has been designed to decouple the collection, storage, and querying of provenance metadata, with a novel provenance kernel that mediates between the producers and consumers ofprovenance information, and handles the persistent storage of records.
Trustworthy Whole-System Provenance for the Linux Kernel
TLDR
Linux Provenance Modules (LPM) is presented, the first general framework for the development of provenance-aware systems, and is the first step towards widespread deployment of trustworthy provenANCE-aware applications.
High Accuracy Attack Provenance via Binary-based Execution Partition
TLDR
The technique, called BEEP, has negligible runtime overhead (< 1.4%) and low space overhead (12.28% on average) and is effective in capturing the minimal causal graph for every attack case the authors have studied, without any dependence explosion.
ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting
TLDR
ProTracer is proposed, a lightweight provenance tracing system that alternates between system event logging and unit level taint propagation that is built on an on-the-fly system event processing infrastructure that features a very lightweight kernel module and a sophisticated user space daemon that performs concurrent and out-of-order event processing.
Secure network provenance
This paper introduces secure network provenance (SNP), a novel technique that enables networked systems to explain to their operators why they are in a certain state -- e.g., why a suspicious routing
Layering in Provenance Systems
TLDR
A provenance collection structure facilitating the integration of provenance across multiple levels of abstraction is designed, including a workflow engine, a web browser, and an initial runtime Python provenance tracking wrapper that sits atop provenance-aware network storage that builds upon a Provenance-Aware Storage System (PASS).
Trade-Offs in Automatic Provenance Capture
TLDR
This work begins to explore these trade-offs for representative examples of these approaches for automatic provenance capture by means of evaluation and measurement, and bases its evaluation on UnixBench--a widely used benchmark suite within systems research.
Identifying the provenance of correlated anomalies
TLDR
An architecture is presented that allows fine-grained auditing on individual hosts, space-efficient representation of anomalous activity that can be centrally correlated, and tracing anomalies back to individual files and processes in the system.
The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance
TLDR
This paper describes a provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes it extremely easy to deploy and presents empirical results that show the run-time overhead of the approach to recording provenance with confidentiality and integrity guarantees ranges from 1%-13%.
...
...