Should we use the post-hoc tests based on mean-ranks? Abstract The statistical comparison of multiple algorithms over multiple data sets is fundamental in machine learning. This is typically carried out by the Friedman test. When the Friedman test rejects the null hypothesis, multiple comparisons are carried out to establish which are the significant… (More)
OBJECTIVE The recurrence of nonfunctioning pituitary adenomas (NFPAs) after surgical removal is common. The aim of our study was to investigate and correlate the growth fraction of NFPAs with clinical characteristics and long-term follow-up results. METHODS Tumor specimens were obtained from 101 consecutive patients with NFPAs (48 female patients and 53… (More)
Bayesian methods are ubiquitous in machine learning. Nevertheless, the analysis of empirical results is typically performed by frequentist tests. This implies dealing with null hypothesis significance tests and p-values, even though the shortcomings of such methods are well known. We propose a nonparametric Bayesian version of the Wilcoxon signed-rank test… (More)
The aim of this paper is to derive new near-ignorance models on the probability simplex, which do not directly involve the Dirichlet distribution and, thus, that are alternative to the Imprecise Dirichlet Model. We focus our investigation to a particular class of distributions on the simplex which is known as the class of Normalized Infinitely Divis-ible… (More)
A fundamental task in machine learning is to compare the performance of multiple algorithms. This is usually performed by the frequentist Friedman test followed by multiple comparisons. This implies dealing with the well-known shortcomings of null hypothesis significance tests. We propose a Bayesian approach to overcome these problems. We provide three main… (More)
Most hypothesis testing in machine learning is done using the frequentist null-hypothesis significance test, which has severe drawbacks. We review recent Bayesian tests which overcome the drawbacks of the frequentist ones.
We propose a new approach for the statistical comparison of algorithms which have been cross-validated on multiple data sets. It is a Bayesian hierarchical method; it draws inferences on single and on multiple datasets taking into account the mean and the variability of the cross-validation results. It is able to detect equivalent classifiers and to claim… (More)
Gaussian processes have been used in different application domains such as classification , regression etc. In this paper we show that they can also be employed as a universal tool for developing a large variety of Bayesian statistical hypothesis tests for regression functions. In particular, we will use GPs for testing whether (i) two functions are equal;… (More)