Corpus ID: 10883521

The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression

@article{ElYaniv2017ThePA,
  title={The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression},
  author={Ran El-Yaniv and Yonatan Geifman and Yair Wiener},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.08499}
}
We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the well-known R-squared measure, widely… Expand
A thorough analysis of the contribution of experimental, derived and sequence-based predicted protein-protein interactions for functional annotation of proteins
TLDR
It is shown that poor performance in Protein-Protein Interaction networks can be considerably improved by adding edges predicted from various data sources, such as text mining, and that associations from the STRING database are more useful than interactions predicted by a neural network from sequence-based features. Expand
A thorough analysis of the contribution of experimental, derived and sequence-based predicted protein-protein interactions for functional annotation of proteins
TLDR
It is shown that poor performance in Protein-Protein Interaction networks can be considerably improved by adding edges predicted from various data sources, such as text mining, and that associations from the STRING database are more useful than interactions predicted by a neural network from sequence-based features. Expand
Sparsity of Protein-Protein Interaction Networks Hinders Function Prediction in Non-Model Species
TLDR
A simple network-based classifier is used to predict Biological Process Gene Ontology terms from protein interaction data in three species: Saccharomyces cerevisiae, Arabidopsis thaliana and Solanum lycopersicum (tomato). Expand

References

SHOWING 1-10 OF 23 REFERENCES
The Balanced Accuracy and Its Posterior Distribution
TLDR
It is shown that both problems can be overcome by replacing the conventional point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy. Expand
The Relationship Between Agnostic Selective Classification Active Learning and the Disagreement Coefficient
TLDR
The main result of this paper is an equivalence between the existence of a fast rejection rate for any PCS learning algorithm (such as ILESS); a poly-logarithmic bound for Hanneke's disagreement coefficient; and an exponential speedup for a new disagreement-based active learner called ActiveiLESS. Expand
Agnostic Selective Classification
TLDR
It is theoretically possible to track the same classification performance of the best (unknown) hypothesis in their class, provided that the authors are free to abstain from prediction in some region of their choice, and a novel selective classification algorithm is developed using constrained SVMs. Expand
Agnostic Pointwise-Competitive Selective Classification
TLDR
A heuristic approximation procedure that is based on SVMs is considered, and it is shown empirically that this algorithm consistently outperforms a traditional rejection mechanism based on distance from decision boundary. Expand
Random forests for metric learning with implicit pairwise position dependence
TLDR
This work adopts a new angle on the metric learning problem and learns a single metric that is able to implicitly adapt its distance function throughout the feature space and is an order of magnitude faster than state of the art multi-metric methods. Expand
An Effective Integrated Method for Learning Big Imbalanced Data
TLDR
This paper proposes an integrated method for learning large imbalanced datasets using the water pipeline datasets collected from various Australia regions in the past two decades and shows that the proposed method is both practical and effective. Expand
On the Foundations of Noise-free Selective Classification
TLDR
This paper presents in this paper a thorough analysis of selective classification including characterizations of RC trade-offs in various interesting settings and constructs algorithms that can optimally or near optimally achieve the best possible trade-off in a controlled manner. Expand
Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets
TLDR
This work applies a new framework for comparing evaluation metrics in classification applications with imbalanced datasets to compare overall accuracy and Kappa coefficient, and demonstrates that Kappa coefficient is more suitable. Expand
Missing values: how many can they be to preserve classification reliability?
TLDR
Using five medical datasets, it is discovered that for a two-class dataset, despite as high as 20–30% missing values, almost as good results as with no missing value could still be produced. Expand
A Robust Algorithm for Classification Using Decision Trees
  • B. Chandra, P. Paul V
  • Computer Science
  • 2006 IEEE Conference on Cybernetics and Intelligent Systems
  • 2006
TLDR
It has been shown on various datasets taken from UCI machine learning data repository that this approach gives better classification accuracy as compared to C4.5, SLIQ and Elegant Decision Tree Algorithm (EDTA). Expand
...
1
2
3
...