Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

@article{Bowers2018ValidationAI,
  title={Validation and Inference of Schema-Level Workflow Data-Dependency Annotations},
  author={Shawn Bowers and Timothy M. McPhillips and Bertram Lud{\"a}scher},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.09899}
}
An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer “lineage” relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs… 

References

SHOWING 1-10 OF 15 REFERENCES

Declarative Rules for Inferring Fine-Grained Data Provenance from Scientific Workflow Execution Traces

TLDR
This work presents a high-level declarative language for expressing explicit dependency rules that can be applied (at any time) to workflow trace events to generate fine-grained dependency information and presents an alternative approach that decouples dependency inference from workflow systems and underlying execution traces.

LabelFlow Framework for Annotating Workflow Provenance

TLDR
This paper investigates whether provenance can be exploited to support reporting and describes LabelFlow, a framework comprised of four Labelling Operators for decorating provenance with domain-specific Labels, a tool that takes as input a workflow, and produces as output a Labelling Pipeline for that workflow, comprised of Labelled Operators.

Provenance as dependency analysis†

TLDR
It is argued that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input.

Linking Prospective and Retrospective Provenance in Scripts

TLDR
This work shows how the provenance traces recorded by no Workflow can be mapped to the workflow specifications generated by Yes Workflow from scripts based on user annotations, and presents competency queries illustrating how a workflow view generated from the script can be used to explore theprovenance recorded during script execution.

YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

TLDR
YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts, and represents the scripts in terms of entities based on the typical scientific workflow model.

Provenance and scientific workflows: challenges and opportunities

TLDR
This tutorial provides an overview of research issues in provenance for scientific workflows, with a focus on recent literature and technology in this area, aimed at a general database research audience and at people who work with scientific data and workflows.

Lineage tracing for general data warehouse transformations

TLDR
This work formally defines the lineage tracing problem in the presence of general data warehouse transformations, and presents algorithms for lineage tracing in this environment, and can be used as the basis for a lineage tracing tool in a general warehousing setting.

Yin & Yang: Demonstrating Complementary Provenance from noWorkflow & YesWorkflow

TLDR
This work demonstrates how combining complementary information gathered by noWorkflow and YesWorkflow enables provenance queries and data lineage visualizations neither tool can provide on its own.

ProvenanceCurious: a tool to infer data provenance from scripts

TLDR
A tool which can infer fine-grained data provenance based on a given script, which is demonstrated using a hydrological model and tested successfully handling other scripts in different contexts.

PROV-O: The PROV Ontology

The PROV Ontology (PROV-O) expresses the PROV Data Model using the OWL2 Web Ontology Language. It provides a set of classes, properties, and restrictions that can be used to represent and interchange