Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

@article{Li2017RefiningAE,
  title={Refining Automatically Extracted Knowledge Bases Using Crowdsourcing},
  author={Chunhua Li and Pengpeng Zhao and Victor S. Sheng and Xuefeng Xian and Jian Wu and Zhiming Cui},
  journal={Computational Intelligence and Neuroscience},
  year={2017},
  volume={2017}
}
Machine-constructed knowledge bases often contain noisy and inaccurate facts. [] Key Method To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments…

Figures and Tables from this paper

References

SHOWING 1-10 OF 30 REFERENCES

Combining information extraction and human computing for crowdsourced knowledge acquisition

TLDR
A novel system architecture, called Higgins, is presented, which shows how to effectively integrate an IE engine and a HC engine and demonstrates the effectiveness of Higgins for knowledge acquisition by crowdsourced gathering of relationships between characters in narrative descriptions of movies and books.

Knowledge vault: a web-scale approach to probabilistic knowledge fusion

TLDR
The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

TLDR
KATARA is proposed, a knowledge base and crowd powered data cleaning system that interprets table semantics to align it with the KB, identifies correct and incorrect data, and generates top-k possible repairs for incorrect data.

Human computing games for knowledge acquisition

TLDR
This work provides a combined approach that tightly integrates automated extraction techniques with human computing for effective gathering of facts in the form of relationships between entities.

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

TLDR
A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.

Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic

TLDR
This paper presents a Markov logic-based system for cleaning an extracted knowledge base that allows a scalable system such as NELL to take advantage of joint probabilistic inference, or, conversely, allowsMarkov logic to be applied to a web scale problem.

CrowdMap: Crowdsourcing Ontology Alignment with Microtasks

TLDR
CrowdMap is introduced, a model to acquire human contributions via microtask crowdsourcing to improve the accuracy of existing ontology alignment solutions in a fast, scalable, and cost-effective manner.

CrowdER: Crowdsourcing Entity Resolution

TLDR
This work proposes a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are use to verify only the most likely matching pairs, and develops a novel two-tiered heuristic approach for creating batched tasks.

A hybrid machine-crowdsourcing system for matching web tables

TLDR
This paper proposes a concept-based approach that maps each column of a web table to the best concept, in a well-developed knowledge base, that represents it and develops a hybrid machine-crowdsourcing framework that leverages human intelligence to discern the concepts for “difficult” columns.

Knowledge Graph Identification

TLDR
This paper shows how uncertain extractions about entities and their relations can be transformed into a knowledge graph and shows that compared to existing methods, the proposed approach is able to achieve improved AUC and F1 with significantly lower running time.