A survey of data provenance techniques
@inproceedings{Simmhan2005ASO, title={A survey of data provenance techniques}, author={Yogesh L. Simmhan and Beth Plale and Dennis Gannon}, year={2005} }
Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by…
216 Citations
A survey of data provenance in e-science
- Computer ScienceSGMD
- 2005
The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it.
Provenance Representation and Storage Techniques in Linked Data: A State-of-the-Art Survey
- Computer Science
- 2012
This paper appraise different techniques in this field mostly in terms of the representation, storage, and generation of provenance information of Linked Data in relation to the linked datasets.
Data Provenance: A Categorization of Existing Approaches
- Computer ScienceBTW
- 2007
A survey of data provenance models and prototypes is given, a general categorization scheme forprovenance models is presented and this categorization enables us to distinguish between different kinds of provenance information and could lead to a better understanding ofprovenance in general.
Variable provenance in software systems
- Computer ScienceRSSE 2014
- 2014
It is argued that determining the origin(s) of the data held by a variable and the history of modifications of the variable can provide critical information along many dimensions about what happens in the source code.
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
- Computer Science2006 IEEE International Conference on Web Services (ICWS'06)
- 2006
A framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead.
Data Provenance in Economical Database Design
- Computer Science2011 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring
- 2011
This paper emphasizes on describing about how data is generated and evolves with time going on, and gives an example of Data Provenance in Economical Database Design to describe the application of data provenance in economic.
Knowledge driven decision support system for provenance models in relational database
- Computer Science2014 International Conference on Data Science & Engineering (ICDSE)
- 2014
A knowledge driven decision support system based on Analytic Hierarchy Process (AHP) is suggested to evaluate the performance of existing provenance models in relational database to capture and querying the provenance information.
A Data Provenance based Architecture to Enhance the Reliability of Data Analysis for Industry 4.0
- Computer Science2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA)
- 2018
This paper extends the concept “data provenance” in the manufacturing domain to acquire information about the data origin and data changes and proposes an architecture to manage provenance of process data, in which the data provenance is considered as annotation of processData.
Capturing Data Provenance With A User-Driven Feedback Approach
- Computer Science
- 2015
The core PROV model may be used to represent the provenance of user feedback information and a system architecture to gather and manage feedback from end-users is proposed.
Semantic Provenance for Science Data Products: Application to Image Data Processing
- Computer ScienceSWPM
- 2009
This paper describes the work on a federated set of data services in the area of solar coronal physics, and describes the use of semantic technologies for encoding provenance and domain knowledge and shows how provenances and domain ontologies can be used together to satisfy complex use cases.
98 References
Why and Where: A Characterization of Data Provenance
- Computer ScienceICDT
- 2001
An approach to computing provenance when the data of interest has been created by a database query is described, adopting a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML.
Recording and Reasoning over Data Provenance in Web and Grid Services
- Computer ScienceOTM
- 2003
This work proposes an infrastructure level support for a provenance recording capability for service-oriented architectures such as the Grid and Web Services and provides a mechanism by which provenance is used to determine whether previous computed results are still up to date.
DBNotes: a post-it system for relational databases based on provenance
- Computer ScienceSIGMOD '05
- 2005
DBNotes, a Post-It note system for relational databases where every piece of data may be associated with zero or more notes (or annotations), is demonstrated, which can easily determine the provenance of data through a sequence of transformation steps simply by examining the annotations.
The requirements of recording and using provenance in e- Science experiments
- Computer Science
- 2005
This paper presents use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, application-independent architecture.
Recording and using provenance in a protein compressibility experiment
- Computer ScienceHPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005.
- 2005
It is demonstrated that provenance recording overhead of the prototype system remains under 10% of execution time, and it is shown that the recorded information successfully supports use cases in a performant manner.
Semantically Linking and Browsing Provenance Logs for E-science
- Computer ScienceICSNW
- 2004
This paper describes how to assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them.
Multi-scale Science: Supporting Emerging Practice with Semantically Derived Provenance
- Computer Science
- 2003
This work is developing a general-purpose informatics-based approach that emphasizes ''on-demand'' metadata creation, configurable data translations, and semantic mapping to support the rapidly increasing and continually evolving requirements for managing data, metadata, and data relationships in multi-scale science projects.
Using Semantic Web Technologies for Representing E-science Provenance
- Computer ScienceSEMWEB
- 2004
This work explores the use of Semantic Web technologies such as RDF, and ontologies to support its representation and used existing initiatives such as Jena and LSID, to generate and store such material.
Data Provenance: Some Basic Issues
- Computer Science, GeologyFSTTCS
- 2000
The term data provenance is used to refer to the process of tracing and recording the origins of data and its movement between databases.
Data annotations, provenance, and archiving
- Computer Science
- 2002
This dissertation examines the problem of data provenance and two main issues related to provenance: Annotation and archiving and developed a technique for specifying key constraints for hierarchical data that generalizes the way keys are specified in relational databases.