Lifting Data Portals to the Web of Data

Abstract

Data portals are central hubs for freely available (governmental) datasets. Œese portals use di‚erent so‰ware frameworks to publish their data, and the metadata descriptions of these datasets come in di‚erent schemas accordingly to the framework. Œe present work aims at re-exposing and connecting the metadata descriptions of currently 854k datasets on 261 data portals to the Web of Linked Data by mapping and publishing their homogenized metadata in standard vocabularies such as DCAT and Schema.org. Additionally, we publish existing quality information about the datasets and further enrich their descriptions by automatically generated metadata for CSV resources. In order to make all this information traceable and trustworthy, we annotate the generated data using the W3C’s provenance vocabulary. Œe dataset descriptions are harvested weekly and we o‚er access to the archived data by providing APIs compliant to the Memento framework. All this data – a total of about 120 million triples per weekly snapshot – is queryable at the SPARQL endpoint at data.wu.ac.at/portalwatch/sparql.

View Slides

Cite this paper

@inproceedings{Neumaier2017LiftingDP, title={Lifting Data Portals to the Web of Data}, author={Sebastian Neumaier and J{\"{u}rgen Umbrich and Axel Polleres}, booktitle={LDOW@WWW}, year={2017} }