• Corpus ID: 9014576

An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R

@article{Torgo2014AnIF,
  title={An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R},
  author={Lu{\'i}s Torgo},
  journal={ArXiv},
  year={2014},
  volume={abs/1412.0436}
}
  • L. Torgo
  • Published 1 December 2014
  • Computer Science
  • ArXiv
This document describes an infra-structure provided by the R package performanceEstimation that allows to estimate the predictive performance of different approaches (workflows) to predictive tasks. The infra-structure is generic in the sense that it can be used to estimate the values of any performance metrics, for any workflow on different predictive tasks, namely, classification, regression and time series tasks. The package also includes several standard workflows that allow users to easily… 

Figures from this paper

MetaUtil: Meta Learning for Utility Maximization in Regression
TLDR
The MetaUtil algorithm is versatile allowing the conversion of any out-of-the-box regression algorithm into a utility-based method, and is shown to show the advantage in a large set of experiments on a diverse set of domains.
A Comparative Study of Performance Estimation Methods for Time Series Forecasting
TLDR
Empirical experiments suggest that cross-validation approaches can be applied to stationary synthetic time series and the most accurate estimates are produced by the out-of-sample methods, which preserve the temporal order of observations.
A Statistical Framework for Predictive Model Evaluation in MOOCs
TLDR
This work compares five modeling techniques, comparing a lasso penalized logistic regression model, naïve Bayes, random forest, SVM, and classification tree across three sets of features, and presents comparative performance results for several classifiers across the three different feature extraction methods.
Explaining the Performance of Black Box Regression Models
  • Inês Areosa, L. Torgo
  • Computer Science
    2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
  • 2019
TLDR
This paper presents several tools designed to help in understanding and explaining the reasons for the observed predictive performance of black box regression models, and describes, evaluates and proposes several variants of Error Dependence Plots.
Beyond Average Performance - exploring regions of deviating performance for black box classification models
TLDR
Two general approaches are described that can be used to provide interpretable descriptions of the expected performance of any black box classification model in situations where the models are expected to have a performance that deviates significantly from their average behaviour.
Prediction and Ranking of Highly Popular Web Content
TLDR
An evaluation framework is proposed for a robust interpretation of the prediction models’ ability to accurately forecast highly popular web content, allowing for a faster and more precise recommendation of such items.
Using meta-learning for model type selection in predictive big data analytics
TLDR
This study presents a meta-learning based model-selection system and provides an evaluation of the system and presents an ontology-based automated model- selection system extending the Scala-based SCALATION data framework.
Ensembles for Time Series Forecasting
TLDR
A new type of ensembles that aims at improving the predictive performance of these approaches in time series forecasting by proposing a new form of diversity generation that explores some specic properties of time series prediction tasks.
Dynamic and Heterogeneous Ensembles for Time Series Forecasting
TLDR
This paper addresses the issue of learning time series forecasting models in changing environments by leveraging the predictive power of ensemble methods by dynamically combining base learners according to their recent performance using a non-linear function.
Resampling Strategies for Imbalanced Time Series Forecasting 3 2 Problem Definition
TLDR
Results show a significant increase in predictive accuracy on rare cases associated with using resampling strategies, and the use of biased strategies further increases accuracy over non-biased strategies.
...
1
2
3
4
...

References

SHOWING 1-10 OF 10 REFERENCES
Classification and regression trees
  • W. Loh
  • Computer Science
    WIREs Data Mining Knowl. Discov.
  • 2011
TLDR
This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Data Mining with R: Learning with Case Studies
Data Mining with R: Learning with Case Studies, Second Edition uses practical examples to illustrate the power of R and data mining. Providing an extensive update to the best-selling first edition,
MetaCost: a general method for making classifiers cost-sensitive
TLDR
A principled method for making an arbitrary classifier cost-sensitive by wrapping a cost-minimizing procedure around it is proposed, called MetaCost, which treats the underlying classifier as a black box, requiring no knowledge of its functioning or change to it.
Statistical Comparisons of Classifiers over Multiple Data Sets
  • J. Demšar
  • Computer Science
    J. Mach. Learn. Res.
  • 2006
TLDR
A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
The Foundations of Cost-Sensitive Learning
TLDR
It is argued that changing the balance of negative and positive training examples has little effect on the classifiers produced by standard Bayesian and decision tree learning methods, and the recommended way of applying one of these methods is to learn a classifier from the training set and then to compute optimal decisions explicitly using the probability estimates given by the classifier.
SMOTE: Synthetic Minority Over-sampling Technique
TLDR
A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.
Andreas Weingessel , and Friedrich Leisch. e1071: Misc Functions of the Department of Statistics (e1071)
  • Andreas Weingessel , and Friedrich Leisch. e1071: Misc Functions of the Department of Statistics (e1071)
  • 2012
R package version 1.6-1
  • Misc Functions of the Department of Statistics (e1071), TU Wien,
  • 2012
Classification and Regression Trees. Statistics/Probability Series
  • Classification and Regression Trees. Statistics/Probability Series
  • 1984
The following are some illustrations of the use of other available utility func- tions. Obtaining the scores on all iterations and metrics of a workflow on a particular task: getScores(res, 'svm.v6
  • The following are some illustrations of the use of other available utility func- tions. Obtaining the scores on all iterations and metrics of a workflow on a particular task: getScores(res, 'svm.v6