A Latent Dirichlet Model for Unsupervised Entity Resolution

Abstract

Entity resolution has received considerable attention in recent years. Given many references to underlying entities, the goal is to predict which references correspond to the same entity. We show how to extend the Latent Dirichlet Allocation model for this task and propose a probabilistic model for collective entity resolution for relational domains where references are connected to each other. Our approach differs from other recently proposed entity resolution approaches in that it is a) generative, b) does not make pair-wise decisions and c) captures relations between entities through a hidden group variable. We propose a novel sampling algorithm for collective entity resolution which is unsupervised and also takes entity relations into account. Additionally, we do not assume the domain of entities to be known and show how to infer the number of entities from the data. We demonstrate the utility and practicality of our relational entity resolution approach for author resolution in two real-world bibliographic datasets. In addition, we present preliminary results on characterizing conditions under which relational information is useful.

DOI: 10.1137/1.9781611972764.5

Extracted Key Phrases

6 Figures and Tables

02040'05'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

259 Citations

Semantic Scholar estimates that this publication has 259 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Bhattacharya2006ALD, title={A Latent Dirichlet Model for Unsupervised Entity Resolution}, author={Indrajit Bhattacharya and Lise Getoor}, booktitle={SDM}, year={2006} }