Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases

@article{Zhuang2017HikeAH,
  title={Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases},
  author={Yan Zhuang and Guoliang Li and Zhuojian Zhong and Jianhua Feng},
  journal={Proceedings of the 2017 ACM on Conference on Information and Knowledge Management},
  year={2017}
}
  • Yan Zhuang, Guoliang Li, Jianhua Feng
  • Published 6 November 2017
  • Computer Science
  • Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
With the vigorous development of the World Wide Web, many large-scale knowledge bases (KBs) have been generated. [] Key Method We then construct a partial order on these partitions and develop an inference model which crowdsources a set of tasks to the crowd and infers the answers of other tasks based on the crowdsourced tasks. Next we formulate the question selection problem, which, given a monetary budget B, selects B crowdsourced tasks to maximize the number of inferred tasks. We prove that this problem…

Figures and Tables from this paper

Crowdsourced Collective Entity Resolution with Relational Match Propagation
TLDR
This paper proposes a novel approach called crowdsourced collective ER, which iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance and achieves superior accuracy with much less labeling.
Entity alignment via knowledge embedding and type matching constraints for knowledge graph inference
TLDR
This work proposes a new EA framework based on knowledge embeddings (KEs) and type matching constraints that significantly improves the accuracy of EA compared with state-of-the-art methods.
LargeEA: Aligning Entities for Large-scale Knowledge Graphs
TLDR
This work proposes LargeEA, a general tool that can adopt any existing EA approach to learn entities’ structural features within each mini-batch independently, and develops a largescale EA benchmark called DBP1M extracted from real-world KGs.
Modeling Topic-Based Human Expertise for Crowd Entity Resolution
TLDR
A probabilistic graphical model is proposed that computes ER task similarity, estimates human expertise, and infers the task truths in a unified framework and achieves higher accuracy on the task truth inference and is more consistent with the human real expertise.
A Rating-Ranking Method for Crowdsourced Top-k Computation
TLDR
A unified model is proposed to model the rating and ranking questions, and seamlessly combine them together to compute the Top- k results, which significantly outperforms existing approaches.
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies
TLDR
A comprehensive analysis of various existing EA methods, elaborating their appli-cations and limitations and distinguishing the methods based on their underlying algorithms and the information they incorporate to learn entity representations is presented.
i-HUMO: An Interactive Human and Machine Cooperation Framework for Entity Resolution with Quality Guarantees
TLDR
i-HUMO is a major improvement over HUMO in that it is interactive: its process of human workload selection is optimized based on real-time risk analysis on human-labeled results as well as pre-specified machine metrics.
Make It Easy: An Effective End-to-End Entity Alignment Framework
TLDR
This paper proposes EASY, an effective end-to-end EA framework, able to remove the labor-intensive pre-processing by fully discovering the name information provided by the entities themselves; and jointly fuse the features captured by the names of entities and the structural information of the graph to improve the EA results.
r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees
TLDR
The r-HUMO framework is introduced and compared with the state-of-the-art alternatives, it can achieve desired quality control with reduced human cost and is the first solution that optimizes the process of human workload selection from a risk perspective.
OAG: Toward Linking Large-scale Heterogeneous Entity Graphs
TLDR
LinKG is coupled with three linking modules, each of which addresses one category of entities, and can achieve linking accuracy with an F1-score of 0.9510, significantly outperforming the state-of-the-art.
...
1
2
3
4
...

References

SHOWING 1-10 OF 49 REFERENCES
PBA: Partition and Blocking Based Alignment for Large Knowledge Bases
TLDR
A scalable partition-and-blocking based alignment framework, named Pba, which can automatically align knowledge bases with tens of millions of instances efficiently and significantly outperforms state-of-art approaches in efficiency, even by an order of magnitude, while keeping high alignment quality.
Crowdsourcing Algorithms for Entity Resolution
TLDR
This paper considers the problem of designing optimal strategies for asking questions to humans that minimize the expected number of questions asked, and analyzes several strategies that can be claimed as "optimal" for this problem in a recent work but can perform arbitrarily bad in theory.
CrowdER: Crowdsourcing Entity Resolution
TLDR
This work proposes a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are use to verify only the most likely matching pairs, and develops a novel two-tiered heuristic approach for creating batched tasks.
CrowdMap: Crowdsourcing Ontology Alignment with Microtasks
TLDR
CrowdMap is introduced, a model to acquire human contributions via microtask crowdsourcing to improve the accuracy of existing ontology alignment solutions in a fast, scalable, and cost-effective manner.
SIGMa: simple greedy matching for aligning large knowledge bases
TLDR
Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts, which is an iterative propagation algorithm that leverages both the structural information from the relationship graph and flexible similarity measures between entity properties in a greedy local search, which makes it scalable.
Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach
TLDR
A cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality, and develops error-tolerant techniques to tolerate the errors introduced by the partial order and the crowd.
Question Selection for Crowd Entity Resolution
TLDR
A probabilistic framework for ER is proposed that can be used to estimate how much ER accuracy the authors obtain by asking each question and select the best question with the highest expected accuracy by computing the expected accuracy.
A hybrid machine-crowdsourcing system for matching web tables
TLDR
This paper proposes a concept-based approach that maps each column of a web table to the best concept, in a well-developed knowledge base, that represents it and develops a hybrid machine-crowdsourcing framework that leverages human intelligence to discern the concepts for “difficult” columns.
QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications
TLDR
This paper investigates the online task assignment problem: Given a pool of n questions, which of the k questions should be assigned to a worker, and proposes a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT.
Leveraging transitive relations for crowdsourced joins
TLDR
This paper proposes a hybrid transitive-relations and crowdsourcing labeling framework which aims to crowdsource the minimum number of pairs to label all the candidate pairs, and proves the optimal labeling order and devise a parallel labeling algorithm to efficientlyrowdsource the pairs following the order.
...
1
2
3
4
5
...