Knowledge graph refinement: A survey of approaches and evaluation methods

  title={Knowledge graph refinement: A survey of approaches and evaluation methods},
  author={Heiko Paulheim},
  journal={Semantic Web},
  • H. Paulheim
  • Published 6 December 2016
  • Computer Science
  • Semantic Web
In the recent years, different Web knowledge graphs, both free and commercial, have been created. While Google coined the term "Knowledge Graph" in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO, and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, such as Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scale knowledge graphs that… 

Tables from this paper

FarsBase: The Persian knowledge graph

FarsBase is the first Persian multi-source knowledge graph, which is specifically designed for semantic search engines to support Persian knowledge and adopts a low-cost mechanism for verifying candidate knowledge by human experts.

Knowledge Graphs on the Web - an Overview

This chapter provides an overview and comparison of those publicly available knowledge graphs, and gives insights into their contents, size, coverage, and overlap.

Towards a Definition of Knowledge Graphs

This work discusses and defines the term knowledge graph, considering its history and diversity in interpretations and use, and proposes a definition of knowledge graphs that serves as basis for discussions on this topic and contributes to a common vision.

Towards Building a Knowledge Graph with Open Data - A Roadmap

The approach to knowledge graph development will compute the confidence score of every relationship elicited from underpinning open data in the knowledge graph to generate logically consistent facts from usually inaccurate and inconsistent open datasets.

Steps to Knowledge Graphs Quality Assessment

This work extended the current state-of-the-art frameworks for quality assessment of data, information, linked data, and KGs by adding various quality dimensions (QDs) and quality metrics (QMs) that are added to KGs.

Measuring Accuracy of Triples in Knowledge Graphs

This paper introduces an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples from other knowledge graphs by applying different matching methods between the predicates of source triples and target triples.

Refining Transitive and Pseudo-Transitive Relations at Web Scale

An efficient web-scale knowledge graph refinement algorithm is introduced that removes the least amount of edges to make the graph of transitive relations cycle-free while maintaining a better precision in identifying erroneous edges as measured against a human gold-standard.

Beyond DBpedia & YAGO - The New Kids on the Knowledge Graph Block

This talk will look at two ongoing projects related to the extraction of knowledge graphs from Wikipedia and other Wikis, and the transfer of the DBpedia approach to a multitude of arbitrary Wikis.

Towards Profiling Knowledge Graphs

Methods for Knowledge Graph profiling are discussed, crucial differences of the big, well-known Knowledge Graphs, like DBpedia, YAGO, and Wikidata are depicted, and a glance at current developments of new, complementary Knowledge graphs such as DBkWik and WebIsALOD are thrown at.

Top K Hypotheses Selection on a Knowledge Graph

This paper introduces an algorithmic framework for efficiently addressing the combinatorial hardness and selecting the top K hypotheses based on powerful algorithmic techniques recently invented in the context of the Weighted Constraint Satisfaction Problem (WCSP).



Knowledge vault: a web-scale approach to probabilistic knowledge fusion

The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.

Type inference through the analysis of Wikipedia links

Two techniques that exploit wikilinks are presented, one based on induction from machine learning techniques, and the other on abduction, which suggest some new possible directions to entity classication that could be taken.

Knowledge base completion via search-based question answering

A way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way by learning the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute.

Type-Constrained Representation Learning in Knowledge Graphs

This work integrated prior knowledge in form of type-constraints in various state of the art latent variable approaches and shows that prior knowledge on relation-types significantly improves these models up to 77% in link-prediction tasks.

Yago: a core of semantic knowledge

YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts, which includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE).

Semantic Relation Composition in Large Scale Knowledge Bases

This paper employs classical rule mining techniques to perform relation composition on knowledge graphs to learn first order rules and proposes a technique to automatically discover semantically enriched conjunctive relations in a knowledge base.

Information extraction from Wikipedia: moving down the long tail

Three novel techniques for increasing recall from Wikipedia's long tail of sparse classes are presented: shrinkage over an automatically-learned subsumption taxonomy, a retraining technique for improving the training data, and supplementing results by extracting from the broader Web.

Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia

This article uses the distant supervision paradigm to extract the missing information directly from the Wikipedia article, using a Relation Extraction tool trained on the information already present in DBpedia, and evaluates the suitability of the approach on a data set consisting of seven DBpedia properties.

A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources

This paper proposes an approach that iteratively refines type acquisition based on the output of the mapping generator, and is able to produce ever-improving mappings consistently across iterations.

Extending DBpedia with Wikipedia List Pages

It is discussed how a combination of frequent pattern mining and natural language processing (NLP) methods can be leveraged in order to extend both the DBpedia ontology, as well as the instance information in DBpedia.