Share This Author
Classification in Networked Data: a Toolkit and a Univariate Case Study
The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data.
Robust Classification for Imprecise Environments
It is shown that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions, and in some cases, the performance of the hybrid actually can surpass that of the best known classifier.
Tree Induction for Probability-Based Ranking
It is concluded that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required, and is shown that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings.
Quality management on Amazon Mechanical Turk
This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.
Get another label? improving data quality and data mining using multiple, noisy labelers
The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
A "budget-sensitive" progressive sampling algorithm is introduced for selecting training examples based on the class associated with each example and it is shown that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.
The Case against Accuracy Estimation for Comparing Induction Algorithms
This work describes and demonstrates what it believes to be the proper use of ROC analysis for comparative studies in machine learning research, and argues that this methodology is preferable both for making practical choices and for drawing conclusions.
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers to present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs.
Data Science and its Relationship to Big Data and Data-Driven Decision Making
It is argued that there are good reasons why it has been hard to pin down exactly what is data science, and that to serve business effectively, it is important to understand its relationships to other important related concepts, and to begin to identify the fundamental principles underlying data science.
A Simple Relational Classifier
It is argued that a simple relational predictive model the predicts only based on class labels of related neighbors, using no learning and no inherent attributes should be used as a baseline to assess the performance of relational learners.