Practical whole-system provenance capture

  title={Practical whole-system provenance capture},
  author={Thomas Pasquier and Xueyuan Han and Mark Goldstein and Thomas Moyer and D. Eyers and Margo I. Seltzer and Jean Bacon},
  journal={Proceedings of the 2017 Symposium on Cloud Computing},
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous… 

Runtime Analysis of Whole-System Provenance

This work presents CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications, and demonstrates the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance.

PR EP RI NT Runtime Analysis of Whole-System Provenance

CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications, and provides inline, realtime provenance analysis, making it suitable for implementing security applications.

A Comprehensive Survey on the State-of-the-art Data Provenance Approaches for Security Enforcement

A comparative study of the state-of-the-art approaches to provenance by classifying them based on frameworks, deployed techniques, and subjects of interest to discuss the emergence and scope of data provenance in IoT network.

Provenance expressiveness benchmarking on non-deterministic executions

This work proposed an extension to the automated provenance benchmarking tool, ProvMark, to handle non-determinism, and aims to provide all-around automated expressiveness benchmarking for real-world examples.

Observed vs. Possible Provenance (Research Track)

The idea of possible provenance, in which the constraint that provenance must be directly observed is relaxed, is proposed, and some key next steps in advancing this research are suggested.

CLARION: Sound and Clear Provenance Tracking for Microservice Deployments

The results demonstrate the utility of the CLARION system and how it outperforms the state-of-the-art provenance tracking systems by providing an accurate and concise view of data provenance in container environments.

Improving reproducibility of data science pipelines through transparent provenance capture

Ursprung is presented, a transparent provenance collection system designed for data science environments that is able to capture sufficient provenance for a variety of use cases and only adds an overhead of up to 4%.

A comprehensive survey on data provenance: State-of-the-art approaches and their deployments for IoT security enforcement

A comparative study of the state-of-the-art approaches to provenance by classifying them based on frameworks, deployed techniques, and subjects of interest to discuss the emergence and scope of data provenance in IoT network.

ProvMark: A Provenance Expressiveness Benchmarking System

An automated tool is presented, ProvMark, that uses an existing provenance system as a black box and reliably identifies the provenance graph structure recorded for a given activity, by a reduction to subgraph isomorphism problems handled by an external solver.

Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

A Provenance-Based Label Propagation Algorithm which is able to regularize and cluster a large number of irregular provenance and can significantly improve provenance query performance with a small run-time overhead is proposed.



Trustworthy Whole-System Provenance for the Linux Kernel

Linux Provenance Modules (LPM) is presented, the first general framework for the development of provenance-aware systems, and is the first step towards widespread deployment of trustworthy provenANCE-aware applications.

Hi-Fi: collecting high-fidelity whole-system provenance

Hi-Fi, a kernel-level provenance system which leverages the Linux Security Modules framework to collect high-fidelity whole-system provenance, is presented and it is demonstrated that Hi-Fi is able to record a variety of malicious behavior within a compromised system.

Take Only What You Need: Leveraging Mandatory Access Control Policy to Reduce Provenance Storage Costs

A novel approach to policy-based provenance pruning is proposed - leverage the confinement properties provided by Mandatory Access Control (MAC) systems in order to identify subdomains of system activity for which to collect provenance.

A General-Purpose Provenance Library

The Core Provenance Library (CPL) is presented, a portable, multi-lingual library that application programmers can easily incorporate into a variety of tools to collect and integrate provenance.

High-throughput Ingest of Provenance Records into Accumulo

This paper investigates the use of D4M and Accumulo to support high-throughput data ingest of whole-system provenance data and finds that it is able to ingest 3,970 graph components per second.

Layering in Provenance Systems

A provenance collection structure facilitating the integration of provenance across multiple levels of abstraction is designed, including a workflow engine, a web browser, and an initial runtime Python provenance tracking wrapper that sits atop provenance-aware network storage that builds upon a Provenance-Aware Storage System (PASS).

SPADE: Support for Provenance Auditing in Distributed Environments

The system has been designed to decouple the collection, storage, and querying of provenance metadata, with a novel provenance kernel that mediates between the producers and consumers ofprovenance information, and handles the persistent storage of records.

Retrofitting Applications with Provenance-Based Security Monitoring

DAP is presented, a transparent architecture for capturing detailed data provenance for web service components that leverages a key insight that minimal knowledge of open protocols can be leveraged to extract precise and efficient provenance information by interposing on application components' communications.

The Requirements of Using Provenance in e-Science Experiments

This paper presents use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, technology and application-independent architecture.

Expressiveness Benchmarking for System-Level Provenance

An expressiveness benchmark consisting of tests intended to capture the provenance of individual system calls is proposed, which is presented work in progress on the benchmark examples for Linux and how they are handled by two different provenance tools, SPADE and OPUS.