The Lack of A Priori Distinctions Between Learning Algorithms
@article{Wolpert1996TheLO, title={The Lack of A Priori Distinctions Between Learning Algorithms}, author={David H. Wolpert}, journal={Neural Computation}, year={1996}, volume={8}, pages={1341-1390} }
This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower…
925 Citations
Any Two Learning Algorithms Are (Almost) Exactly Identical
- Computer Science
- 2000
This paper shows that if one is provided with a loss function, it can be used in a natural way to specify a distance measure quantifying the similarity of any two supervised learning algorithms, even non-parametric algorithms, indicating that any two learning algorithms are almost exactly identical for such scenarios.
Domains of competence of the semi-naive Bayesian network classifiers
- Computer ScienceInf. Sci.
- 2014
On classifier behavior in the presence of mislabeling noise
- Computer ScienceData Mining and Knowledge Discovery
- 2016
The “sigmoid rule” framework is presented, which can be used to choose the most appropriate learning algorithm depending on the properties of noise in a classification problem, and is applicable to concept drift scenarios, including modeling user behavior over time, and mining of noisy time series of evolving nature.
Bias-free hypothesis evaluation in multirelational domains
- MathematicsMRDM '05
- 2005
It is demonstrated that the bias due to linkage to known objects varies with the chosen proportion of the training/test split and presented an algorithm, generalized subgraph sampling, that is guaranteed to avoid bias in the test set for more generalized cases.
Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy
- Computer SciencePatterns
- 2020
The No Free Lunch theorem in Pattern Recognition
- Computer Science
- 2009
The main aim is to propose an expression of the generic prior in terms of the classification powerinherent in successively more encompassing subspaces of feature space, and investigate some of the prope rties of such a ”generic” prior for pattern recognition, and investigates a descriptive framework that may be useful in that reg ard.
When is the Naive Bayes approximation not so naive?
- Computer ScienceMachine Learning
- 2017
This paper proposes a set of “local” error measures—associated with the likelihood functions for subsets of attributes and for each class—and shows explicitly how these local errors combine to give a “global’ error associated to the full attribute set.
An automatic extraction method of the domains of competence for learning classifiers using data complexity measures
- Computer ScienceKnowledge and Information Systems
- 2013
This work presents an automatic extraction method to determine the domains of competence of a classifier using a set of data complexity measures proposed for the task of classification, allowing the user to characterize the response quality of the methods from a dataset’s complexity.
The Construction of a Majority-Voting Ensemble Based on the Interrelation and Amount of Information of Features
- Computer ScienceComput. J.
- 2020
A new ensemble learning algorithm called VIBES is introduced, which is better in terms of performance when compared to 85 machine learning algorithms in WEKA tool, and has the highest average classification accuracy rate across the 33 datasets.
References
SHOWING 1-10 OF 55 REFERENCES
On the Connection between In-sample Testing and Generalization Error
- Computer ScienceComplex Syst.
- 1992
It is impossible to justify a corre latio n between rep roducti on of a training set and generali zation err or off of the training set using only a pr iori reasoning, and a novel formalism for address ing mac hine learni ng is developed.
Off-training-set error for the Gibbs and the Bayes optimal generalizers
- Computer ScienceCOLT 1995
- 1995
It is shown that when the target function is fixed, expected off-training-set error can increase with training set size if and only if the expected error averaged over all targets decreases withTraining set size.
Machine learning
- Computer ScienceCSUR
- 1996
Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
On the monotonicity of the performance of Bayesian classifiers (Corresp.)
- Computer ScienceIEEE Trans. Inf. Theory
- 1978
It would appear that the peaking behavior of practical classifiers is caused principally by their nonoptimal use of the features, which contradicts previous interpretations of Hughes' model.
On Overfitting Avoidance as Bias
- Computer Science
- 1993
A formal analysis of contentions of Schaffer (1993) proves that his contentions are valid, although some of his experiments must be interpreted with caution.
Improving Performance in Neural Networks Using a Boosting Algorithm
- Computer ScienceNIPS
- 1992
The effect of boosting is reported on four databases consisting of 12,000 digits from segmented ZIP codes from the United State Postal Service and the following from the National Institute of Standards and Testing (NIST).
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
- Computer ScienceNATO Neurocomputing
- 1989
Two modifications are explained: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non- linearity of feed-forward non-linear networks with multiple outputs.
Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction
- Computer ScienceJ. Artif. Intell. Res.
- 1994
The results of the experiments indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees.
The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework
- Computer Science
- 1995
A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem and on the Bayesian " Occam factors " argument for Occam's razor.
Local Algorithms for Pattern Recognition and Dependencies Estimation
- Computer ScienceNeural Computation
- 1993
The theoretical framework on which local learning algorithms, which result in performance improvements for real problems, are based are presented, and a new statement of certain learning problems, namely the local risk minimization is presented.