# Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic

@inproceedings{Yan2003OptimizingCP, title={Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic}, author={Lian Yan and Robert H. Dodier and Michael C. Mozer and Richard H. Wolniewicz}, booktitle={ICML}, year={2003} }

When the goal is to achieve the best correct classification rate, cross entropy and mean squared error are typical cost functions used to optimize classifier performance. However, for many real-world classification problems, the ROC curve is a more meaningful performance measure. We demonstrate that minimizing cross entropy or mean squared error does not necessarily maximize the area under the ROC curve (AUC). We then consider alternative objective functions for training a classifier to…

## 264 Citations

### Turning the hyperparameter of an AUC-optimized classifier

- Computer ScienceBNAIC
- 2005

A classifier that optimizes the AUC using a linear programming formulation such that classification constraints can easily be subsampled and enables to use the non-used constraints to for the optimization of hyperparameters is discussed.

### Learning to rank by maximizing the AUC with linear programming for problems with binary output

- Computer Science
- 2007

This work presents a linear programming approach (LPR), similar to 1-norm Support Vector Machines (SVM), for ranking instances with binary outputs by maximizing an approximation to the WMW statistic.

### An ensemble classifier learning approach to ROC optimization

- Computer Science18th International Conference on Pattern Recognition (ICPR'06)
- 2006

The proposed ensemble maximal figure-of-merit (E-MFoM) learning framework meets four key requirements desirable for ROC optimization and outperforms the state-of theart algorithms using Wilcoxon-Mann-Whitney rank statistics.

### An Unbiased Variance Estimator of a K-sample U-statistic with Application to AUC in Binary Classification

- Mathematics, Computer Science
- 2019

This work proposes a new method, an unbiased variance estimator of a general K-sample U-statistic, and applies it to evaluate the variance of AUC, and suggests choosing the most parsimonious model whose AUC score is within 1 standard error of the maximum AUC.

### Optimising area under the ROC curve using gradient descent

- Computer ScienceICML
- 2004

This paper introduces RankOpt, a linear binary classifier which optimises the area under the ROC curve (the AUC). Unlike standard binary classifiers, RankOpt adopts the AUC statistic as its objective…

### Learning to Rank by Maximizing AUC with Linear Programming

- Computer ScienceThe 2006 IEEE International Joint Conference on Neural Network Proceedings
- 2006

This work presents a linear programming approach similar to 1-norm Support Vector Machines (SVMs) for instance ranking by an approximation to the WMW statistic, which can be applied to nonlinear problems by using a kernel function.

### Optimizing Area Under the ROC Curve using Ranking SVMs

- Computer Science
- 2005

It is shown that an SVM optimized for ranking not only achieves better AUC than other linear classifiers on average, but also performs comparably in accuracy.

### Partial AUC Maximization via Nonlinear Scoring Functions

- Computer ScienceArXiv
- 2018

It is shown experimentally that nonlinear scoring fucntions improve the conventional methods through the application of a binary classification of real and bogus objects obtained with the Hyper Suprime-Cam on the Subaru telescope.

### Overlaying Classifiers: A Practical Approach to Optimal Scoring

- Computer Science
- 2010

The goal of this paper is to propose a statistical learning method for constructing a scoring function with nearly optimal ROC curve by proposing a discretization approach, consisting of building a finite sequence of N classifiers by constrained empirical risk minimization and then constructing a piecewise constant scoring function sN(x) by overlaying the resulting classifiers.

### Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

- Computer ScienceArXiv
- 2018

This work shows that in the case of linear predictors, and under the assumption that the data has normal distribution, the expected error and the expected AUC are not only smooth, but have closed form expressions, which depend on the first and second moments of the normal distribution.

## References

SHOWING 1-10 OF 17 REFERENCES

### Prodding the ROC Curve: Constrained Optimization of Classifier Performance

- Computer ScienceNIPS
- 2001

This work describes a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate.

### A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

- Computer ScienceMachine Learning
- 2004

This work extends the definition of the area under the ROC curve to the case of more than two classes by averaging pairwise comparisons and proposes an alternative definition of proportion correct based on pairwise comparison of classes for a simple artificial case.

### Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry

- Computer ScienceIEEE Trans. Neural Networks Learn. Syst.
- 2000

These experiments show that under a wide variety of assumptions concerning the cost of intervention and the retention rate resulting from intervention, using predictive techniques to identify potential churners and offering incentives can yield significant savings to a carrier.

### Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation

- MedicineNIPS
- 1995

Two methods that together improve the accuracy of backprop nets on a pneumonia risk assessment problem by 10-50%.

### The Case against Accuracy Estimation for Comparing Induction Algorithms

- Computer ScienceICML
- 1998

This work describes and demonstrates what it believes to be the proper use of ROC analysis for comparative studies in machine learning research, and argues that this methodology is preferable both for making practical choices and for drawing conclusions.

### Improving prediction of customer behavior in nonstationary environments

- Computer ScienceIJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222)
- 2001

Two distinct approaches to churn prediction are proposed, using more historical data or new, unlabeled data, to improve the results for this real-world, large-scale, nonstationary problem.

### Optimizing parameters in a ranked retrieval system using multi-query relevance feedback

- Computer Science
- 1994

A method is proposed by which parameters in ranked-output text retrieval systems can be automatically optimized to improve retrieval performance, to adjust system parameters to maximize the match between the system`s document ordering and the user`s desired ordering, given by relevance feedback.

### Use of a Multi-Layer Perceptron to Predict Malignancy in Ovarian Tumors

- MedicineNIPS
- 1997

A Multi-Layer Perceptron neural network classifier for use in preoperative differentiation between benign and malignant ovarian tumors is developed, able to make reliable predictions with a discriminating performance comparable to that of experienced gynecologists.

### Individual Comparisons by Ranking Methods

- Mathematics
- 1945

The comparison of two treatments generally falls into one of the following two categories: (a) we may have a number of replications for each of the two treatments, which are unpaired, or (b) we may…

### Signal detection theory and psychophysics

- Psychology
- 1966

This book discusses statistical decision theory and sensory processes in signal detection theory and psychophysics and describes how these processes affect decision-making.