The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data.
It is shown that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions, and in some cases, the performance of the hybrid actually can surpass that of the best known classifier.
It is concluded that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required, and is shown that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings.
This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.
The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
A "budget-sensitive" progressive sampling algorithm is introduced for selecting training examples based on the class associated with each example and it is shown that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.
This work describes and demonstrates what it believes to be the proper use of ROC analysis for comparative studies in machine learning research, and argues that this methodology is preferable both for making practical choices and for drawing conclusions.
The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers to present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs.
It is argued that there are good reasons why it has been hard to pin down exactly what is data science, and that to serve business effectively, it is important to understand its relationships to other important related concepts, and to begin to identify the fundamental principles underlying data science.
It is argued that a simple relational predictive model the predicts only based on class labels of related neighbors, using no learning and no inherent attributes should be used as a baseline to assess the performance of relational learners.