# Learning to Abstain from Binary Prediction

@article{Balsubramani2016LearningTA, title={Learning to Abstain from Binary Prediction}, author={Akshay Balsubramani}, journal={ArXiv}, year={2016}, volume={abs/1602.08151} }

A binary classifier capable of abstaining from making a label prediction has two goals in tension: minimizing errors, and avoiding abstaining unnecessarily often. In this work, we exactly characterize the best achievable tradeoff between these two goals in a general semi-supervised setting, given an ensemble of predictors of varying competence as well as unlabeled data on which we wish to predict or abstain. We give an algorithm for learning a classifier in this setting which trades off its…

## 8 Citations

### A Potential-based Framework for Online Learning with Mistakes and Abstentions

- Computer Science
- 2016

This paper develops an algorithmic framework for designing online learning algorithms with mistakes and abstentions, utilizing a notion called admissible potential functions, and immediately yields natural generalizations of existing algorithms onto online learning with abstention.

### The Extended Littlestone's Dimension for Learning with Mistakes and Abstentions

- Computer ScienceCOLT
- 2016

A novel measure, called the Extended Littlestone's Dimension, is provided, which captures the number of abstentions needed to ensure a certain number of mistakes.

### Machine Learning with a Reject Option: A survey

- Computer ScienceArXiv
- 2021

The conditions leading to two types of rejection, ambiguity and novelty rejection are introduced and the existing architectures for models with a reject option are defined, and the standard learning strategies to train such models are described and relate traditional machine learning techniques to rejection.

### Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

- Computer ScienceACL
- 2016

The unanimity principle is introduced: only predict when all models consistent with the training data predict the same output, operationalize this principle for semantic parsing, the task of mapping utterances to logical forms.

### Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017

After detecting adversarial examples, it is shown that many of them can be recovered by simply performing a small average filter on the image, which should lead to more insights about the classification mechanisms in deep convolutional neural networks.

### Research Statement

- Computer Science
- 2015

This research has centered around machine learning, which has exploded in the past two decades, working to devise new algorithms that exploit data structure in practical learning scenarios.

### The Logical Consistency of Simultaneous Agnostic Hypothesis Tests

- Computer ScienceEntropy
- 2016

This paper adapts the above logical requirements to agnostic tests, in which one can accept, reject or remain agnostic with respect to a given hypothesis, and provides examples of such tests that satisfy all logical requirements and also perform well statistically.

### Geo-spatial text-mining from Twitter – a feature space analysis with a view toward building classification in urban regions

- Computer ScienceEuropean Journal of Remote Sensing
- 2019

A feature space analysis of geo-tagged Twitter text messages from the Los Angeles area and a geo-spatial text mining approach to classify buildings types into commercial and residential to illustrate a base toward fusing text features and remote sensing images to classify urban building types.

## References

SHOWING 1-10 OF 29 REFERENCES

### Optimal Binary Classifier Aggregation for General Losses

- Computer ScienceNIPS
- 2016

A family of semi-supervised ensemble aggregation algorithms which are as efficient as linear learning by convex optimization, but are minimax optimal without any relaxations are found.

### Aggregating Binary Classifiers Optimally with General Losses

- Computer Science
- 2015

A family of parameter-free ensemble aggregation algorithms which use labeled and unlabeled data are developed, as efficient as linear learning and prediction for convex risk minimization but work without any relaxations on many nonconvex losses like the 0-1 loss.

### Agnostic Selective Classification

- Computer ScienceNIPS
- 2011

It is theoretically possible to track the same classification performance of the best (unknown) hypothesis in their class, provided that the authors are free to abstain from prediction in some region of their choice, and a novel selective classification algorithm is developed using constrained SVMs.

### Classification with a Reject Option using a Hinge Loss

- Computer ScienceJ. Mach. Learn. Res.
- 2008

This work considers the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation and proposes a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs).

### Support Vector Machines with a Reject Option

- Computer ScienceNIPS
- 2008

The problem of binary classification where the classifier may abstain instead of classifying each observation is considered, and the double hinge loss function that focuses on estimating conditional probabilities only in the vicinity of the threshold points of the optimal decision rule is derived.

### Optimally Combining Classifiers Using Unlabeled Data

- Computer ScienceCOLT
- 2015

A worst-case analysis of aggregation of classifier ensembles for binary classification identifies cases where a weighted combination of the classifiers can perform significantly better than any single classifier.

### Active Learning via Perfect Selective Classification

- Computer ScienceJ. Mach. Learn. Res.
- 2012

A reduction of active learning to selective classification that preserves fast rates is shown and exponential target-independent label complexity speedup is derived for actively learning general (non-homogeneous) linear classifiers when the data distribution is an arbitrary high dimensional mixture of Gaussians.

### On the Foundations of Noise-free Selective Classification

- Computer ScienceJ. Mach. Learn. Res.
- 2010

This paper presents in this paper a thorough analysis of selective classification including characterizations of RC trade-offs in various interesting settings and constructs algorithms that can optimally or near optimally achieve the best possible trade-off in a controlled manner.

### An Optimal Reject Rule for Binary Classifiers

- Computer ScienceSSPR/SPR
- 2000

This paper presents an optimal reject rule for binary classifiers, based on the Receiver Operating Characteristic curve, which is optimal since it maximizes a classification utility function, defined on the basis of classification and error costs peculiar for the application at hand.

### Support Vector Machine Active Learning with Applications to Text Classification

- Computer ScienceJ. Mach. Learn. Res.
- 2001

Experimental results showing that employing the active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings are presented.