StatSnowball: a statistical approach to extracting entity relationships

Abstract

Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called <i>Statistical Snowball</i> (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l<sub>1</sub>-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called <i>Renlifang</i> based on it.

DOI: 10.1145/1526709.1526724

Extracted Key Phrases

8 Figures and Tables

02040200920102011201220132014201520162017
Citations per Year

225 Citations

Semantic Scholar estimates that this publication has 225 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Zhu2009StatSnowballAS, title={StatSnowball: a statistical approach to extracting entity relationships}, author={Jun Zhu and Zaiqing Nie and Xiaojiang Liu and Bo Zhang and Ji-Rong Wen}, booktitle={WWW}, year={2009} }