• Publications
  • Influence
Classification in Networked Data: a Toolkit and a Univariate Case Study
TLDR
The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. Expand
Robust Classification for Imprecise Environments
TLDR
It is shown that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions, and in some cases, the performance of the hybrid actually can surpass that of the best known classifier. Expand
Tree Induction for Probability-Based Ranking
TLDR
It is concluded that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required, and is shown that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings. Expand
Quality management on Amazon Mechanical Turk
TLDR
This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers. Expand
Get another label? improving data quality and data mining using multiple, noisy labelers
TLDR
The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial. Expand
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
TLDR
A "budget-sensitive" progressive sampling algorithm is introduced for selecting training examples based on the class associated with each example and it is shown that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance. Expand
The Case against Accuracy Estimation for Comparing Induction Algorithms
TLDR
This work describes and demonstrates what it believes to be the proper use of ROC analysis for comparative studies in machine learning research, and argues that this methodology is preferable both for making practical choices and for drawing conclusions. Expand
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
TLDR
The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers to present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. Expand
Data Science and its Relationship to Big Data and Data-Driven Decision Making
TLDR
It is argued that there are good reasons why it has been hard to pin down exactly what is data science, and that to serve business effectively, it is important to understand its relationships to other important related concepts, and to begin to identify the fundamental principles underlying data science. Expand
A Simple Relational Classifier
TLDR
It is argued that a simple relational predictive model the predicts only based on class labels of related neighbors, using no learning and no inherent attributes should be used as a baseline to assess the performance of relational learners. Expand
...
1
2
3
4
5
...