Correcting evaluation bias of relational classifiers with network cross validation

@article{Neville2010CorrectingEB,
  title={Correcting evaluation bias of relational classifiers with network cross validation},
  author={Jennifer Neville and Brian Gallagher and Tina Eliassi-Rad and Tao Wang},
  journal={Knowledge and Information Systems},
  year={2010},
  volume={30},
  pages={31-55}
}
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More… 

Figures and Tables from this paper

Correcting Bias in Statistical Tests for Network Classifier Evaluation
TLDR
This paper analyzes the sources of bias theoretically and proposes analytical corrections to standard significance tests to reduce the Type I error rate to more acceptable levels, while maintaining reasonable levels of statistical power to detect true performance differences.
Distribution-free bounds for relational classification
TLDR
The Hoeffding bounds are extended to the relational setting and derive distribution-free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classification algorithms that have been developed in SRL.
Multi-label relational neighbor classification using social context features
TLDR
This paper proposes a multi-label iterative relational neighbor classifier that employs social context features (SCRN), which incorporates a class propagation probability distribution obtained from instances' social features, which are in turn extracted from the network topology.
Learning Collective Behavior in Multi-relational Networks
TLDR
This dissertation proposes two classification frameworks for identifying human collective behavior in multi-relational social networks and unsupervised and supervised learning models for relationship prediction in multi -relational collaborative networks that improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure.
Link Prediction-Based Multi-label Classification on Networked Data
TLDR
A link prediction-based multi-label relational neighbor classifier which employs social context features (LP-SCRN), which firstly predicts missing links in the network, and then calculates the weights of the links according to the similarity between nodes in their social features.
Network-SVM: Support Vector Machine for Network Data
  • Computer Science
  • 2017
TLDR
This paper proposes a novel large-margin classifier designed for network data, referred to as Network-SVM, that avoids the problem of conventional network classification methods by automatically pruning unreliable neighbors during training and extends the framework to network data with nonlinear decision boundaries.
Within-Network Classification Using Radius-Constrained Neighborhood Patterns
TLDR
It is demonstrated that frequent neighborhood patterns, originally studied in the pattern mining literature, serve as a strong class of structure-aware features and provide satisfactory effectiveness in WNC.
SURVEY OF CLASSIFICATION RULE MINING TECHNIQUES FOR IDENTIFYING DISEASE CAUSE AND DIAGNOSIS
TLDR
The medical dataset is analyzed with stroke disease reducing error rates providing classification accuracy and certain data mining papers on classification rule for disease diagnosis patterns are reviewed.
Benchmarking Classifier Performance with Sparse Measurements
TLDR
The described methodology is based on missing value imputation and was demon- strated to work, even when 80% of measurements are missing, for example because of unavailable algorithm im- plementations or unavailable datasets.
...
1
2
3
...

References

SHOWING 1-10 OF 42 REFERENCES
Evaluating Statistical Tests for Within-Network Classifiers of Relational Data
TLDR
This work examines the task of within-network classification and the question of whether two algorithms will learn models which will result in significantly different levels of performance and proposes a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error.
Probabilistic Classification and Clustering in Relational Data
TLDR
This work proposes a general class of models for classification and clustering in relational domains that capture probabilistic dependencies between related instances, and shows how to learn such models efficiently from data.
Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study
TLDR
This work explores a complimentary approach to within-network classification, based on the use of label-independent (LI) features - i.e., features calculated without using the values of class labels - and shows that, in many cases, it is a combination of a few diverse network-based structural characteristics that is most informative.
Discriminative Probabilistic Models for Relational Data
TLDR
An alternative framework that builds on (conditional) Markov networks and addresses two limitations of the previous approach is presented, showing how to train these models effectively, and how to use approximate probabilistic inference over the learned model for collective classification of multiple related entities.
An Examination of Experimental Methodology for Classifiers of Relational Data
TLDR
This work surveys the literature on relational classifiers and examines the various experimental methodologies reported therein, revealing that methodologies fall into two main groups, based on distinct formulations of the classification problem: between- network classification and within-network classification.
Learning relational probability trees
TLDR
This paper presents an algorithm for learning the structure and parameters of an RPT that searches over a space of relational features that use aggregation functions to dynamically propositionalize relational data and create binary splits within the RPT.
Relational Dependency Networks
TLDR
This paper presents relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting and outlines the relative strengths of RDNs---namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques.
Using ghost edges for classification in sparsely labeled networks
TLDR
This paper proposes a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks and demonstrates that this approach performs well across a range of conditions where existing approaches fail.
Classification in Networked Data: a Toolkit and a Univariate Case Study
TLDR
The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data.
Acora: Distribution-Based Aggregation for Relational Learning from Identifier Attributes
TLDR
A novel aggregation method is presented as part of a relational learning system ACORA, that combines the use of vector distance and meta-data about the class-conditional distributions of attribute values, and the implications ofidentifier aggregation on the expressive power of the induced model are discussed.
...
1
2
3
4
5
...