Using ghost edges for classification in sparsely labeled networks


We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to <i>L</i> &#8226; <i>E</i>, where <i>L</i> is the number of labeled nodes and <i>E</i> is the number of edges.

DOI: 10.1145/1401890.1401925

Extracted Key Phrases

7 Figures and Tables

Citations per Year

151 Citations

Semantic Scholar estimates that this publication has 151 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Gallagher2008UsingGE, title={Using ghost edges for classification in sparsely labeled networks}, author={Brian Gallagher and Hanghang Tong and Tina Eliassi-Rad and Christos Faloutsos}, booktitle={KDD}, year={2008} }