# Knowing what you know: valid confidence sets in multiclass and multilabel prediction

@article{Cauchois2020KnowingWY, title={Knowing what you know: valid confidence sets in multiclass and multilabel prediction}, author={Maxime Cauchois and Suyash Gupta and John C. Duchi}, journal={ArXiv}, year={2020}, volume={abs/2004.10181} }

We develop conformal prediction methods for constructing valid predictive confidence sets in multiclass and multilabel problems without assumptions on the data generating distribution. A challenge here is that typical conformal prediction methods---which give marginal validity (coverage) guarantees---provide uneven coverage, in that they address easy examples at the expense of essentially ignoring difficult examples. By leveraging ideas from quantile regression, we build methods that always…

## 14 Citations

### Relaxed Conformal Prediction Cascades for Efficient Inference Over Many Labels

- Computer ScienceArXiv
- 2020

This work relaxes CP validity to arbitrary criterions of success---allowing the framework to make more efficient predictions while remaining "equivalently correct," and amortizes cost by conformalizing prediction cascades, in which it aggressively prune implausible labels early on by using progressively stronger classifiers.

### Classification with Valid and Adaptive Coverage

- Computer ScienceNeurIPS
- 2020

A novel conformity score is developed, which is explicitly demonstrate to be powerful and intuitive for classification problems, but whose underlying principle is potentially far more general.

### Robust Validation: Confident Predictions Even When Distributions Shift

- Computer ScienceArXiv
- 2020

A method is presented that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population, and achieves (nearly) valid coverage in finite samples.

### Post-selection Inference for Conformal Prediction: Trading off Coverage for Precision

- Computer Science
- 2023

Uniform conformal inference with finite sample prediction guarantee with arbitrary data-dependent miscoverage levels using distribution-free confidence bands for distribution functions is developed, which allows practitioners to trade freely coverage probability for the quality of the prediction set by any criterion of their choice while maintaining the finite sample guarantees similar to traditional conformals.

### Efficient and Differentiable Conformal Prediction with General Function Classes

- Computer ScienceICLR
- 2022

This paper proposes a generalization of conformal prediction to multiple learnable parameters, by considering the constrained empirical risk minimization (ERM) problem of finding the most efficient prediction set subject to valid empirical coverage, and develops a gradient-based algorithm for it.

### Distribution-free uncertainty quantification for classification under label shift

- Computer ScienceUAI
- 2021

The right way to achieve uncertainty quantification (UQ) is examined by reweighting the aforementioned conformal and calibration procedures whenever some unlabeled data from the target distribution is available, and this work examines these techniques theoretically in a distribution-free framework and demonstrates their excellent practical performance.

### Training Uncertainty-Aware Classifiers with Conformalized Deep Learning

- Computer ScienceNeurIPS
- 2022

The idea is to mitigate overconfidence by minimizing a loss function, inspired by advances in conformal inference, that quantifies model uncertainty by carefully leveraging hold-out data to produce models with more dependable uncertainty estimates, without sacrificing predictive power.

### Few-shot Conformal Prediction with Auxiliary Tasks

- Computer ScienceICML
- 2021

This work develops a novel approach to conformal prediction when the target task has limited data available for training, and demonstrates the effectiveness of this approach across a number of few-shot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery.

### Inject Machine Learning into Significance Test for Misspecified Linear Models

- Computer Science, MathematicsArXiv
- 2020

Experimental results show that the estimator significantly outperforms linear regression for non-linear ground truth functions, indicating that its estimator might be a better tool for the significance test.

### AutoCP: Automated Pipelines for Accurate Prediction Intervals

- Computer Science
- 2020

Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate while optimizing the interval length to be accurate and less conservative.

## 37 References

### Least Ambiguous Set-Valued Classifiers With Bounded Error Levels

- Computer Science, MathematicsJournal of the American Statistical Association
- 2018

This work introduces a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence while minimize the ambiguity while minimizing the ambiguity (the expected size of the output).

### The limits of distribution-free conditional predictive inference

- Computer Science, MathematicsInformation and Inference: A Journal of the IMA
- 2020

This work aims to explore the space in between exact conditional inference guarantees and what types of relaxations of the conditional coverage property would alleviate some of the practical concerns with marginal coverage guarantees while still being possible to achieve in a distribution-free setting.

### A comparison of some conformal quantile regression methods

- MathematicsStat
- 2020

We compare two recent methods that combine conformal inference with quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano,…

### Conformalized Quantile Regression

- Computer Science, MathematicsNeurIPS
- 2019

This paper proposes a new method that is fully adaptive to heteroscedasticity, which combines conformal prediction with classical quantile regression, inheriting the advantages of both.

### Classifier chains for multi-label classification

- Computer ScienceMachine Learning
- 2011

This paper presents a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity, and illustrates the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.

### Cautious Deep Learning

- Computer ScienceArXiv
- 2018

This work proposes constructing conformal prediction sets which contain a set of labels rather than a single label, and demonstrates the performance on the ImageNet ILSVRC dataset and the CelebA and IMDB-Wiki facial datasets using high dimensional features obtained from state of the art convolutional neural networks.

### Learning Models with Uniform Performance via Distributionally Robust Optimization

- Computer Science, MathematicsArXiv
- 2018

A distributionally robust stochastic optimization framework that learns a model providing good performance against perturbations to the data-generating distribution is developed, and a convex formulation for the problem is given, providing several convergence guarantees.

### Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2006

Applications to two real-world multilabel learning problems, i.e., functional genomics and text categorization, show that the performance of BP-MLL is superior to that of some well-established multILabel learning algorithms.

### Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems

- Computer ScienceOper. Res.
- 2010

This paper proposes a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance matrix) and demonstrates that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently.

### Convexity, Classification, and Risk Bounds

- Computer Science
- 2006

A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.