Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

  title={Refining Automatically Extracted Knowledge Bases Using Crowdsourcing},
  author={Chunhua Li and Pengpeng Zhao and Victor S. Sheng and Xuefeng Xian and Jian Wu and Zhiming Cui},
  journal={Computational Intelligence and Neuroscience},
Machine-constructed knowledge bases often contain noisy and inaccurate facts. [] Key Method To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments…

Figures and Tables from this paper



Combining information extraction and human computing for crowdsourced knowledge acquisition

A novel system architecture, called Higgins, is presented, which shows how to effectively integrate an IE engine and a HC engine and demonstrates the effectiveness of Higgins for knowledge acquisition by crowdsourced gathering of relationships between characters in narrative descriptions of movies and books.

Knowledge vault: a web-scale approach to probabilistic knowledge fusion

The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

KATARA is proposed, a knowledge base and crowd powered data cleaning system that interprets table semantics to align it with the KB, identifies correct and incorrect data, and generates top-k possible repairs for incorrect data.

Human computing games for knowledge acquisition

This work provides a combined approach that tightly integrates automated extraction techniques with human computing for effective gathering of facts in the form of relationships between entities.

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.

Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic

This paper presents a Markov logic-based system for cleaning an extracted knowledge base that allows a scalable system such as NELL to take advantage of joint probabilistic inference, or, conversely, allowsMarkov logic to be applied to a web scale problem.

CrowdMap: Crowdsourcing Ontology Alignment with Microtasks

CrowdMap is introduced, a model to acquire human contributions via microtask crowdsourcing to improve the accuracy of existing ontology alignment solutions in a fast, scalable, and cost-effective manner.

CrowdER: Crowdsourcing Entity Resolution

This work proposes a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are use to verify only the most likely matching pairs, and develops a novel two-tiered heuristic approach for creating batched tasks.

A hybrid machine-crowdsourcing system for matching web tables

This paper proposes a concept-based approach that maps each column of a web table to the best concept, in a well-developed knowledge base, that represents it and develops a hybrid machine-crowdsourcing framework that leverages human intelligence to discern the concepts for “difficult” columns.

Knowledge Graph Identification

This paper shows how uncertain extractions about entities and their relations can be transformed into a knowledge graph and shows that compared to existing methods, the proposed approach is able to achieve improved AUC and F1 with significantly lower running time.