Open Data Integration
@article{Miller2018OpenDI, title={Open Data Integration}, author={Ren{\'e}e J. Miller}, journal={Proc. VLDB Endow.}, year={2018}, volume={11}, pages={2130-2139} }
Open data plays a major role in supporting both governmental and organizational transparency. Many organizations are adopting
Open Data Principles
promising to make their open data complete, primary, and timely. These properties make this data tremendously valuable to data scientists. However, scientists generally do not have
a priori
knowledge about what data is available (its schema or content). Nevertheless, they want to be able to use open data and integrate it with other public or…
44 Citations
Open Data Analytic Querying using a Relation-Free API
- Computer ScienceICEIS
- 2020
A solution that eases the task of query development for tabular Open Data analytics through an API, using a simplified query representation where it is not allowed to specify the data relations, and consequently neither the joins over them, called Relation-Free Query.
Voyager: Data Discovery and Integration for Data Science Voyager: Data Discovery and Integration for Data Science
- Computer Science
- 2022
The design decisions made in the development of a system to support data discovery and integration are reported, and an evaluation that investigates both usability and evaluation is reported on.
Data lake concept and systems: a survey
- Computer ScienceArXiv
- 2021
This survey reviews the development, definition, and architectures of data lakes and classify the existing data lake systems based on their provided functions, which makes this survey a useful technical reference for designing, implementing and applying data lakes.
Model-Driven Development of Web APIs to Access Integrated Tabular Open Data
- Computer ScienceIEEE Access
- 2020
This paper proposes a model-driven approach to automatically generate Web APIs that homogeneously access multiple integrated tabular open datasets that can be integrated by means of join and union operations.
Loch Prospector: Metadata Visualization for Lakes of Open Data
- Computer Science2020 IEEE Visualization Conference (VIS)
- 2020
Loch Prospector is proposed, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work.
A Semantic Data Lake Model for Analytic Query-Driven Discovery
- Computer ScienceiiWAS
- 2021
A semantic model for a Data Lake aimed to support data discovery and integration in data analytics scenarios is introduced, suited for identifying the sources and the required transformation steps according to the analytical request.
Processing Analytical Queries over Polystore System for a Large Astronomy Data Repository
- Computer Science, PhysicsApplied Sciences
- 2022
This study studies the models of data integration, analyze them, and incorporate them into a system to manage linked open data provided by astronomical domain, and proposes a web-based query system built around the Polystore database architecture.
BareTQL: An Interactive System for Searching and Extraction of Open Data Tables
- Computer Science27th International Conference on Intelligent User Interfaces
- 2022
BareTQL is presented, an interactive system for querying open data tables in the presence of the aforementioned challenges, which aims to provide an easy and efficient way of querying incomplete data in tables with little or no schema.
HOTMapper: Historical Open Data Table Mapper
- Computer ScienceEDBT
- 2019
This demo will show the creation of the mapping definition and the execution flow of the CLI script for creating a unified data source from scratch and then updating an existing one, to unify real world data sources, containing information about the Brazilian educational system.
Mosaic: A Sample-Based Database System for Open World Query Processing
- Computer ScienceCIDR
- 2020
This vision paper proposes Mosaic, a database system that treats samples as first-class citizens and allows users to ask questions over populations represented by these samples by having a unique sample-based data model with extensions to the SQL language.
References
SHOWING 1-10 OF 77 REFERENCES
The Data Civilizer System
- Computer ScienceCIDR
- 2017
Initial positive experiences are described that show the preliminary DATA CIVILIZER system shortens the time and effort required to find, prepare, and analyze data.
LabBook: Metadata-driven social collaborative data analysis
- Computer Science2015 IEEE International Conference on Big Data (Big Data)
- 2015
The key insight is to collect and use more metadata about all elements of the analytic ecosystem by means of an architecture and user experience that reduce the cost of contributing such metadata.
Data Integration for the Relational Web
- Computer ScienceProc. VLDB Endow.
- 2009
Octopus is a system that combines search, extraction, data cleaning and integration, and enables users to create new data sets from those found on the Web, to offer the user a set of best-effort operators that automate the most labor-intensive tasks.
A framework for semantic link discovery over relational data
- Computer ScienceCIKM
- 2009
A framework for discovery of semantic links from relational data based on declarative specification of linkage requirements by a user is presented, which allows data publishers to easily find and publish high-quality links to other data sources, and therefore could significantly enhance the value of the data in the next generation of web.
From databases to dataspaces: a new abstraction for information management
- Computer ScienceSGMD
- 2005
This paper proposes dataspaces and their support systems as a new agenda for data management, which encompasses much of the work going on in data management today, while posing additional research objectives.
VizCurator: A Visual Tool for Curating Open Data
- Computer ScienceWWW
- 2015
Vizcurator permits the exploration, understanding and curation of open RDF data, its schema, and how it has been linked to other sources, and can be used to create new binary temporal relations by reifying base facts and linking them to temporal resources.
The iBench Integration Metadata Generator
- Computer ScienceProc. VLDB Endow.
- 2015
iBench is the first metadata generator that can be used to evaluate a wide-range of integration tasks ( data exchange, mapping creation, mapping composition, schema evolution, among many others) and is believed to raise the bar for empirical evaluation and comparison of data integration systems.
Data exchange: semantics and query answering
- Computer ScienceTheor. Comput. Sci.
- 2005
The notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange is adopted and the computational complexity of computing the certain answers in this context is investigated.
Using schematically heterogeneous structures
- Computer ScienceSIGMOD '98
- 1998
This work considers a restricted class of higher order views and shows the power of these views in integrating legacy structures and gives conditions under which a higher order view is usable for answering a query and provides query translation algorithms.
Interactive Navigation of Open Data Linkages
- Computer ScienceProc. VLDB Endow.
- 2017
The Toronto Open Data Search system offers users a highly interactive experience making unrelated (and unlinked) dynamic collections of datasets appear as a richly connected cloud of data that can be navigated and combined easily in real time.