Metadata and provenance management

  title={Metadata and provenance management},
  author={Ewa Deelman and G. Bruce Berriman and Ann L. Chervenak and {\'O}scar Corcho and Paul T. Groth and Luc Moreau},
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to… 
Provenance: On and Behind the Screens
The second part of this tutorial focuses on enabling users to leverage provenance through adapted visualizations, and will present some fundamental concepts of visualization before discussing possible visualizations for provenance.
A survey on provenance: What for? What form? What from?
This survey provides an overview of the research field ofprovenance, focusing on what provenance is used for, what types of provenance have been defined and captured for the different applications, and which resources and system requirements impact the choice of deploying a particular provenance solution.
Workflow Provenance Metadata to Enhance Reuse of South America Drainage Datasets
The paper presents an Open Data approach to enhance the release of South America drainage datasets in order to be quickly exploitable by both governmental Institution and private organization coping with such emergencies.
Curated Reasoning by Formal Modeling of Provenance
This Dissertation is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Dissertation in
Generic and adaptive metadata management framework for scientific data repositories
Die vorliegende Arbeit beschreibt eine modulare Architektur fur ein wis- senschaftliches Datenarchiv, die Forschungsgemeinschaften darin unterstutzt, ihre Daten und Metadaten gezielt uber den jeweiligen Lebenszyklus hinweg zu orchestri- eren.
An Upper-Bound Control Approach for Cost-Effective Privacy Protection of Intermediate Dataset Storage in Cloud
This paper proposes a new approach to identify which stored datasets need to be encrypted and which not, and designs an upper bound on privacy measure as long as the overall mixed information amount of some stored datasets is no more than that upper bound.


A survey of data provenance in e-science
The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it.
Tackling the Provenance Challenge one layer at a time
How the VisTrails provenance data are organized in layers is described and a first approach for querying this data that is developed to tackle the Provenance Challenge queries is presented.
Automatic capture and efficient storage of e‐Science experiment provenance
A layered model to represent workflow provenance that allows navigation from an abstract model of the experiment to instance data collected during a specific experiment run and an approach to store this provenance data in a relational database is presented.
Why and Where: A Characterization of Data Provenance
An approach to computing provenance when the data of interest has been created by a database query is described, adopting a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML.
Provenance Tracking in an Earth Science Data Processing System
Science Data Processing Systems should capture, archive, and distribute provenance information of all externally received data and algorithms, as well as describing all internal processes used for data transformation.
The First Provenance Challenge
A Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed.
Managing Rapidly-Evolving Scientific Workflows
An overview of VisTrails, a system that provides an infrastructure for systematically capturing detailed provenance and streamlining the data exploration process, which simplifies data exploration by allowing scientists to easily navigate through the space of workflows and parameter settings for an exploration task.
Report on the International Provenance and Annotation Workshop: (IPAW'06) 3-5 May 2006, Chicago
The provenance of a data item refers to its origins and processing history, while annotation is a term that refers to the process of adding notes or data to an existing structure. Because these terms
Automatic capture and reconstruction of computational provenance
The Earth System Science Server (ES3) project is developing a local infrastructure for managing Earth science data products derived from satellite remote sensing, intended to be flexible enough to manage the idiosyncratic computing ensembles that typify scientific research.
Connecting Scientific Data to Scientific Experiments with Provenance
It is argued that scientists should have access to the full provenance of their data, including not only parameters, inputs and intermediary data, but also the abstract experiment, refined into a concrete execution by the "workflow compiler".