• Corpus ID: 15201370

A survey of data provenance techniques

  title={A survey of data provenance techniques},
  author={Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by… 

Figures and Tables from this paper

A survey of data provenance in e-science

The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it.

Provenance Representation and Storage Techniques in Linked Data: A State-of-the-Art Survey

This paper appraise different techniques in this field mostly in terms of the representation, storage, and generation of provenance information of Linked Data in relation to the linked datasets.

Data Provenance: A Categorization of Existing Approaches

A survey of data provenance models and prototypes is given, a general categorization scheme forprovenance models is presented and this categorization enables us to distinguish between different kinds of provenance information and could lead to a better understanding ofprovenance in general.

Variable provenance in software systems

It is argued that determining the origin(s) of the data held by a variable and the history of modifications of the variable can provide critical information along many dimensions about what happens in the source code.

A Framework for Collecting Provenance in Data-Centric Scientific Workflows

A framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead.

Data Provenance in Economical Database Design

  • Ding HuaPan Yun-wenXu Xiaolei
  • Computer Science
    2011 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring
  • 2011
This paper emphasizes on describing about how data is generated and evolves with time going on, and gives an example of Data Provenance in Economical Database Design to describe the application of data provenance in economic.

Knowledge driven decision support system for provenance models in relational database

  • A. RaniS. Thalia
  • Computer Science
    2014 International Conference on Data Science & Engineering (ICDSE)
  • 2014
A knowledge driven decision support system based on Analytic Hierarchy Process (AHP) is suggested to evaluate the performance of existing provenance models in relational database to capture and querying the provenance information.

A Data Provenance based Architecture to Enhance the Reliability of Data Analysis for Industry 4.0

  • Peng LiO. Niggemann
  • Computer Science
    2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA)
  • 2018
This paper extends the concept “data provenance” in the manufacturing domain to acquire information about the data origin and data changes and proposes an architecture to manage provenance of process data, in which the data provenance is considered as annotation of processData.

Capturing Data Provenance With A User-Driven Feedback Approach

The core PROV model may be used to represent the provenance of user feedback information and a system architecture to gather and manage feedback from end-users is proposed.

Semantic Provenance for Science Data Products: Application to Image Data Processing

This paper describes the work on a federated set of data services in the area of solar coronal physics, and describes the use of semantic technologies for encoding provenance and domain knowledge and shows how provenances and domain ontologies can be used together to satisfy complex use cases.

Why and Where: A Characterization of Data Provenance

An approach to computing provenance when the data of interest has been created by a database query is described, adopting a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML.

Recording and Reasoning over Data Provenance in Web and Grid Services

This work proposes an infrastructure level support for a provenance recording capability for service-oriented architectures such as the Grid and Web Services and provides a mechanism by which provenance is used to determine whether previous computed results are still up to date.

DBNotes: a post-it system for relational databases based on provenance

DBNotes, a Post-It note system for relational databases where every piece of data may be associated with zero or more notes (or annotations), is demonstrated, which can easily determine the provenance of data through a sequence of transformation steps simply by examining the annotations.

The requirements of recording and using provenance in e- Science experiments

This paper presents use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, application-independent architecture.

Recording and using provenance in a protein compressibility experiment

It is demonstrated that provenance recording overhead of the prototype system remains under 10% of execution time, and it is shown that the recorded information successfully supports use cases in a performant manner.

Semantically Linking and Browsing Provenance Logs for E-science

This paper describes how to assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them.

Multi-scale Science: Supporting Emerging Practice with Semantically Derived Provenance

This work is developing a general-purpose informatics-based approach that emphasizes ''on-demand'' metadata creation, configurable data translations, and semantic mapping to support the rapidly increasing and continually evolving requirements for managing data, metadata, and data relationships in multi-scale science projects.

Using Semantic Web Technologies for Representing E-science Provenance

This work explores the use of Semantic Web technologies such as RDF, and ontologies to support its representation and used existing initiatives such as Jena and LSID, to generate and store such material.

Data Provenance: Some Basic Issues

The term data provenance is used to refer to the process of tracing and recording the origins of data and its movement between databases.

Data annotations, provenance, and archiving

This dissertation examines the problem of data provenance and two main issues related to provenance: Annotation and archiving and developed a technique for specifying key constraints for hierarchical data that generalizes the way keys are specified in relational databases.