Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

@article{Acosta2018DetectingLD,
  title={Detecting Linked Data quality issues via crowdsourcing: A DBpedia study},
  author={Maribel Acosta and Amrapali Zaveri and Elena Paslaru Bontas Simperl and Dimitris Kontokostas and Fabian Fl{\"o}ck and Jens Lehmann},
  journal={Semantic Web},
  year={2018},
  volume={9},
  pages={303-335}
}
In this paper we examine the use of crowdsourcing as a means to master Linked Data quality problems that are difficult to solve automatically. We base our approach on the analysis of the most common errors encountered in Linked Data sources, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and compare different crowdsourcing approaches to identify these Linked Data quality issues, employing the DBpedia dataset… 
Linked Data Crowdsourcing Quality Assessment based on Domain Professionalism
TLDR
The concept of Domain Specialization Test (DST), which uses domain professional testing tasks DSTs to evaluate the professionalism of workers, and combines the idea of Mini-batch Gradient Descent (MBGD) to improve the EM algorithm, and the MBEM algorithm is proposed to achieve efficient and accurate evaluation of task results is proposed.
A Conceptual Professional Assessment Model Based RDF Data Crowdsourcing
TLDR
The model extracts test task instances similar to crowdsourced tasks from the standard knowledge base based on the concept hierarchy tree of RDF data crowdsourcing tasks, and uses knowledge representation to automatically build a set of options for test task instance, thereby generating a concept professionalism test task.
Efficient Knowledge Graph Accuracy Evaluation
TLDR
This paper proposes an efficient sampling and evaluation framework, which aims to provide quality accuracy evaluation with strong statistical guarantee while minimizing human efforts, and proposes the use of cluster sampling to reduce the overall cost.
Efficient Knowledge Graph Accuracy Evaluation ( Technical Report Version ) ∗
TLDR
This paper proposes an efficient sampling and evaluation framework, which aims to provide quality accuracy evaluation with strong statistical guarantee while minimizing human efforts, and proposes the use of cluster sampling to reduce the overall cost.
Semantic Web and Human Computation: The status of an emerging field
TLDR
This editorial paper introduces a special issue that solicited papers at the intersection of Semantic Web and Human Computation research, and uses a methodology based on Systematic Mapping Studies to collect quantitative bibliographic data and analyzes the evolution of research in this area.
Services for Connecting and Integrating Big Numbers of Linked Datasets
TLDR
This dissertation analyzes the research work done in the area of Linked Data Integration and proposes indexes and algorithms that can be used at large scale, and proposes techniques that include incremental and parallelized algorithms.
Measuring Accuracy of Triples in Knowledge Graphs
TLDR
This paper introduces an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples from other knowledge graphs by applying different matching methods between the predicates of source triples and target triples.
Can an ontology really support the development of more accurate data quality assessment models? : A survey
TLDR
This study aims to identify existing end-to-end frameworks for quality assessment and improvement of data quality and finds that most of the work deals with only one aspect rather than a combined approach.
Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis
TLDR
This thesis presents a novel knowledge base quality assessment approach using evolution analysis that uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues.
...
...

References

SHOWING 1-10 OF 75 REFERENCES
Crowdsourcing Linked Data Quality Assessment
TLDR
The results show that the two styles of crowdsourcing are complementary and that crowdsourcing-enabled quality assessment is a promising and affordable way to enhance the quality of Linked Data.
TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data
TLDR
This paper presents a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process, and describes the methodology, quality taxonomy and the tools’ system architecture, user perspective and extensibility.
User-driven quality evaluation of DBpedia
TLDR
This study aims to assess the quality of this sample of DBpedia resources and adopt an agile methodology to improve the quality in future versions by regularly providing feedback to the DBpedia maintainers.
Quality assessment for Linked Data: A Survey
TLDR
A systematic review of approaches for assessing the quality of Linked Data, which unify and formalize commonly used terminologies across papers related to data quality and provides a comprehensive list of 18 quality dimensions and 69 metrics.
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking
TLDR
A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.
WhoKnows? Evaluating linked data heuristics with a quiz that cleans up DBpedia
Purpose – Linking Open Data (LOD) provides a vast amount of well structured semantic information, but many inconsistencies may occur, especially if the data are generated with the help of automated
Crowdsourcing and the Semantic Web: A Research Manifesto
TLDR
A roadmap to guide the evolution of the new research field that is emerging at the intersection between crowdsourcing and the Semantic Web is defined, and a list of successful or promising scenarios for both perspectives is described.
Test-driven evaluation of linked data quality
TLDR
This work presents a methodology for test-driven quality assessment of Linked Data, which is inspired by test- driven software development, and argues that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality.
Crowdsourcing and the Semantic Web (Dagstuhl Seminar 14282)
TLDR
This document provides a summary of the Dagstuhl Seminar 14282: Crowdsourcing and the Semantic Web, which in July 2014 brought together researchers of the emerging scientific community at the intersection of crowdsourcing andSemantic Web technologies.
Probabilistic Error Detecting in Numerical Linked Data
TLDR
This study proposes a novel probabilistic framework that enables the detection of inconsistencies in numerical attributes including not only integer, float or double values but also date values of liked data.
...
...