Literally better: Analyzing and improving the quality of literals
@article{Beek2018LiterallyBA, title={Literally better: Analyzing and improving the quality of literals}, author={Wouter Beek and Filip Ilievski and Jeremy Debattista and Stefan Schlobach and Jan Wielemaker}, journal={Semantic Web}, year={2018}, volume={9}, pages={131-150} }
Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that…
Figures and Tables from this paper
12 Citations
A Scalable Framework for Quality Assessment of RDF Datasets
- Computer ScienceSEMWEB
- 2019
This paper presents DistQualityAssessment – an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines and is the first distributed, in-memory approach for computing different quality metrics for large R DF datasets using Apache Spark.
Efficient Distributed In-Memory Processing of RDF Datasets
- Computer Science
- 2020
A novel approach for statistical calculations of large RDF datasets, which scales out to clusters of machines and the first distributed in-memory approach for computing 32 different statistical criteria for RDF dataset using Apache Spark is described.
LOD-a-lot: A Single-File Enabler for Data Science
- Computer ScienceSEMANTiCS
- 2017
There exists a wide collection of Data Science use cases that can be performed over such a LOD-a-lot file, which significantly reduces the cost and complexity of conducting Data Science.
Statistics about Data Shape Use in RDF Data
- Computer ScienceSEMWEB
- 2020
Preliminary statistics about the use of SHACL core constraints in data shapes found on GitHub found that class, datatype and cardinality constraints are predominantly used, similar to the dominant use of domain and range in ontologies.
Evaluating the quality of the LOD cloud: An empirical investigation
- Computer ScienceSemantic Web
- 2018
In this quantitative empirical survey, 130 datasets are analysed using 27 Linked Data quality metrics, using the Principal Component Analysis (PCA) test in order to identify the key quality indicators that can give sufficient information about a dataset’s quality.
Scalable Quality Assessment of Linked Data
- Computer Science
- 2017
This thesis looks into the challenges faced to detect quality problems in linked datasets presenting quality results in a standardised machine-readable and interoperable format for which agents can make sense out of to help human consumers identifying the fitness for use dataset.
Web Semantics: Science, Services and Agents on the WorldWideWeb
- Computer Science
- 2019
The results of two within-group user-centred studies of two online bibliographic systems using a widely deployed OPAC and its counterpart linked-data based system, datos.bne.es, show that users of the system based on linked data required significantly less time and visited fewer pages to complete a typical search and retrieval activity.
LOD-a-lot - A Queryable Dump of the LOD Cloud
- Computer ScienceSEMWEB
- 2017
LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650 K datasets over a single self-indexed file, enabling Web-scale repeatable experimentation and research even by standard laptops.
A Queryable Dump of the LOD Cloud
- Computer Science
- 2017
LOD-a-lot democratizes the access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets over a single self-indexed file, enabling Webscale repeatable experimentation and research even by standard laptops.
References
SHOWING 1-10 OF 43 REFERENCES
Luzzu -- A Framework for Linked Data Quality Assessment
- Computer Science2016 IEEE Tenth International Conference on Semantic Computing (ICSC)
- 2016
Luzzu is a framework for Linked Data Quality Assessment based on an extensible interface for defining new quality metrics, an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be reused within different semantic frameworks, a scalable stream processor for data dumps and SPARQL endpoints, and a customisable ranking algorithm taking into account user-defined weights.
Test-driven evaluation of linked data quality
- Computer ScienceWWW
- 2014
This work presents a methodology for test-driven quality assessment of Linked Data, which is inspired by test- driven software development, and argues that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality.
LOD Laundromat: A Uniform Way of Publishing Other People's Dirty Data
- Computer ScienceSEMWEB
- 2014
The LOD Laundromat is presented, which removes stains from data without any human intervention and is able to make very large amounts of LOD more easily available for further processing right now.
LOTUS: Adaptive Text Search for Big Linked Data
- Computer ScienceESWC
- 2016
The ease with which LOTUS enables text-based resource retrieval at an unprecedented scale in concrete and domain-specific scenarios is demonstrated and the scalability of LOTUS with respect to the LOD Laundromat is provided.
Weaving the Pedantic Web
- Computer ScienceLDOW
- 2010
This paper discusses common errors in RDF publishing, their consequences for applications, along with possible publisher-oriented approaches to improve the quality of structured, machine-readable and open data on the Web.
Quality assessment for Linked Data: A Survey
- Computer ScienceSemantic Web
- 2016
A systematic review of approaches for assessing the quality of Linked Data, which unify and formalize commonly used terminologies across papers related to data quality and provides a comprehensive list of 18 quality dimensions and 69 metrics.
Sieve: linked data quality assessment and fusion
- Computer ScienceEDBT-ICDT '12
- 2012
Sieve, a framework for flexibly expressing quality assessment methods as well as fusion methods for quality assessment and fusion, is presented, which is integrated into the Linked Data Integration Framework (LDIF), which handles Data Access, Schema Mapping and Identity Resolution.
What's up LOD Cloud? Observing The State of Linked Open Data Cloud Metadata
- Computer ScienceLDQ@ESWC
- 2015
Roomba is developed, a tool that enables to validate, correct and generate dataset metadata, and it is shown that the automatic corrections done by Roomba increase the overall quality of the datasets metadata and highlight the need for manual efforts to correct some important missing information.
ClioPatria: A SWI-Prolog infrastructure for the Semantic Web
- Computer ScienceSemantic Web
- 2016
ClioPatria is a comprehensive semantic web development framework based on SWI-Prolog that extends this core with a SPARQL and LOD server, an extensible web frontend to manage the server, browse the data, query the data using SParQL and Prolog and a Git-based plugin manager.
Towards a vocabulary for data quality management in semantic web architectures
- Computer ScienceLWDM '11
- 2011
This paper provides a conceptual model that allows the representation of data quality rules and other quality-related knowledge using the Resource Description Framework (RDF) and the Web Ontology Language (OWL).