Open Data Integration

@article{Miller2018OpenDI,
  title={Open Data Integration},
  author={Ren{\'e}e J. Miller},
  journal={Proc. VLDB Endow.},
  year={2018},
  volume={11},
  pages={2130-2139}
}
Open data plays a major role in supporting both governmental and organizational transparency. Many organizations are adopting Open Data Principles promising to make their open data complete, primary, and timely. These properties make this data tremendously valuable to data scientists. However, scientists generally do not have a priori knowledge about what data is available (its schema or content). Nevertheless, they want to be able to use open data and integrate it with other public or… 

Figures and Tables from this paper

Open Data Analytic Querying using a Relation-Free API
TLDR
A solution that eases the task of query development for tabular Open Data analytics through an API, using a simplified query representation where it is not allowed to specify the data relations, and consequently neither the joins over them, called Relation-Free Query.
Voyager: Data Discovery and Integration for Data Science Voyager: Data Discovery and Integration for Data Science
TLDR
The design decisions made in the development of a system to support data discovery and integration are reported, and an evaluation that investigates both usability and evaluation is reported on.
Data lake concept and systems: a survey
TLDR
This survey reviews the development, definition, and architectures of data lakes and classify the existing data lake systems based on their provided functions, which makes this survey a useful technical reference for designing, implementing and applying data lakes.
Model-Driven Development of Web APIs to Access Integrated Tabular Open Data
TLDR
This paper proposes a model-driven approach to automatically generate Web APIs that homogeneously access multiple integrated tabular open datasets that can be integrated by means of join and union operations.
Loch Prospector: Metadata Visualization for Lakes of Open Data
TLDR
Loch Prospector is proposed, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work.
A Semantic Data Lake Model for Analytic Query-Driven Discovery
TLDR
A semantic model for a Data Lake aimed to support data discovery and integration in data analytics scenarios is introduced, suited for identifying the sources and the required transformation steps according to the analytical request.
Processing Analytical Queries over Polystore System for a Large Astronomy Data Repository
TLDR
This study studies the models of data integration, analyze them, and incorporate them into a system to manage linked open data provided by astronomical domain, and proposes a web-based query system built around the Polystore database architecture.
BareTQL: An Interactive System for Searching and Extraction of Open Data Tables
TLDR
BareTQL is presented, an interactive system for querying open data tables in the presence of the aforementioned challenges, which aims to provide an easy and efficient way of querying incomplete data in tables with little or no schema.
HOTMapper: Historical Open Data Table Mapper
TLDR
This demo will show the creation of the mapping definition and the execution flow of the CLI script for creating a unified data source from scratch and then updating an existing one, to unify real world data sources, containing information about the Brazilian educational system.
Mosaic: A Sample-Based Database System for Open World Query Processing
TLDR
This vision paper proposes Mosaic, a database system that treats samples as first-class citizens and allows users to ask questions over populations represented by these samples by having a unique sample-based data model with extensions to the SQL language.
...
...

References

SHOWING 1-10 OF 77 REFERENCES
The Data Civilizer System
TLDR
Initial positive experiences are described that show the preliminary DATA CIVILIZER system shortens the time and effort required to find, prepare, and analyze data.
LabBook: Metadata-driven social collaborative data analysis
TLDR
The key insight is to collect and use more metadata about all elements of the analytic ecosystem by means of an architecture and user experience that reduce the cost of contributing such metadata.
Data Integration for the Relational Web
TLDR
Octopus is a system that combines search, extraction, data cleaning and integration, and enables users to create new data sets from those found on the Web, to offer the user a set of best-effort operators that automate the most labor-intensive tasks.
A framework for semantic link discovery over relational data
TLDR
A framework for discovery of semantic links from relational data based on declarative specification of linkage requirements by a user is presented, which allows data publishers to easily find and publish high-quality links to other data sources, and therefore could significantly enhance the value of the data in the next generation of web.
From databases to dataspaces: a new abstraction for information management
TLDR
This paper proposes dataspaces and their support systems as a new agenda for data management, which encompasses much of the work going on in data management today, while posing additional research objectives.
VizCurator: A Visual Tool for Curating Open Data
TLDR
Vizcurator permits the exploration, understanding and curation of open RDF data, its schema, and how it has been linked to other sources, and can be used to create new binary temporal relations by reifying base facts and linking them to temporal resources.
The iBench Integration Metadata Generator
TLDR
iBench is the first metadata generator that can be used to evaluate a wide-range of integration tasks ( data exchange, mapping creation, mapping composition, schema evolution, among many others) and is believed to raise the bar for empirical evaluation and comparison of data integration systems.
Using schematically heterogeneous structures
TLDR
This work considers a restricted class of higher order views and shows the power of these views in integrating legacy structures and gives conditions under which a higher order view is usable for answering a query and provides query translation algorithms.
Interactive Navigation of Open Data Linkages
TLDR
The Toronto Open Data Search system offers users a highly interactive experience making unrelated (and unlinked) dynamic collections of datasets appear as a richly connected cloud of data that can be navigated and combined easily in real time.
Schema Mapping and Data Exchange Tools: Time for the Golden Age
TLDR
It is shown how recent results in schema-mapping and data-exchange research may be considered the starting point for a forthcoming golden age, with novel research opportunities and a new generation of systems capable of dealing with a significantly larger class of real-life applications.
...
...