Learn More
Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we(More)
We introduce a novel active learning algorithm for classification of network data. In this setting, training instances are connected by a set of links to form a network, the labels of linked nodes are correlated, and the goal is to exploit these dependencies and accurately label the nodes. This problem arises in many domains, including social and biological(More)
Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling(More)
The problems of object classification (labeling the nodes of a graph) and link prediction (predicting the links in a graph) have been largely studied independently. Commonly, object classification is performed assuming a complete set of known links and link prediction is done assuming a fully observed set of node attributes. In most real world domains,(More)
We address the cost-sensitive feature acquisition problem, where misclassifying an instance is costly but the expected misclassification cost can be reduced by acquiring the values of the missing features. Because acquiring the features is costly as well, the objective is to acquire the right set of features so that the sum of the feature acquisition cost(More)
Supervised and semi-supervised data mining techniques require labeled data. However, labeling examples is costly for many real-world applications. To address this problem, active learning techniques have been developed to guide the labeling process in an effort to minimize the amount of labeled data without sacrificing much from the quality of the learned(More)
Active learning methods aim to choose the most informative instances to effectively learn a good classifier. Uncertainty sampling, arguably the most frequently utilized active learning strategy, selects instances which are uncertain according to the model. In this paper, we propose a framework that distinguishes between two types of uncertainties: a model(More)