The Lack of A Priori Distinctions Between Learning Algorithms

@article{Wolpert1996TheLO,
  title={The Lack of A Priori Distinctions Between Learning Algorithms},
  author={David H. Wolpert},
  journal={Neural Computation},
  year={1996},
  volume={8},
  pages={1341-1390}
}
  • D. Wolpert
  • Published 1 October 1996
  • Computer Science
  • Neural Computation
This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower… 
Any Two Learning Algorithms Are (Almost) Exactly Identical
TLDR
This paper shows that if one is provided with a loss function, it can be used in a natural way to specify a distance measure quantifying the similarity of any two supervised learning algorithms, even non-parametric algorithms, indicating that any two learning algorithms are almost exactly identical for such scenarios.
On classifier behavior in the presence of mislabeling noise
TLDR
The “sigmoid rule” framework is presented, which can be used to choose the most appropriate learning algorithm depending on the properties of noise in a classification problem, and is applicable to concept drift scenarios, including modeling user behavior over time, and mining of noisy time series of evolving nature.
Bias-free hypothesis evaluation in multirelational domains
TLDR
It is demonstrated that the bias due to linkage to known objects varies with the chosen proportion of the training/test split and presented an algorithm, generalized subgraph sampling, that is guaranteed to avoid bias in the test set for more generalized cases.
The No Free Lunch theorem in Pattern Recognition
TLDR
The main aim is to propose an expression of the generic prior in terms of the classification powerinherent in successively more encompassing subspaces of feature space, and investigate some of the prope rties of such a ”generic” prior for pattern recognition, and investigates a descriptive framework that may be useful in that reg ard.
When is the Naive Bayes approximation not so naive?
TLDR
This paper proposes a set of “local” error measures—associated with the likelihood functions for subsets of attributes and for each class—and shows explicitly how these local errors combine to give a “global’ error associated to the full attribute set.
An automatic extraction method of the domains of competence for learning classifiers using data complexity measures
TLDR
This work presents an automatic extraction method to determine the domains of competence of a classifier using a set of data complexity measures proposed for the task of classification, allowing the user to characterize the response quality of the methods from a dataset’s complexity.
The Construction of a Majority-Voting Ensemble Based on the Interrelation and Amount of Information of Features
TLDR
A new ensemble learning algorithm called VIBES is introduced, which is better in terms of performance when compared to 85 machine learning algorithms in WEKA tool, and has the highest average classification accuracy rate across the 33 datasets.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
On the Connection between In-sample Testing and Generalization Error
TLDR
It is impossible to justify a corre latio n between rep roducti on of a training set and generali zation err or off of the training set using only a pr iori reasoning, and a novel formalism for address ing mac hine learni ng is developed.
Off-training-set error for the Gibbs and the Bayes optimal generalizers
TLDR
It is shown that when the target function is fixed, expected off-training-set error can increase with training set size if and only if the expected error averaged over all targets decreases withTraining set size.
Machine learning
TLDR
Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
On the monotonicity of the performance of Bayesian classifiers (Corresp.)
TLDR
It would appear that the peaking behavior of practical classifiers is caused principally by their nonoptimal use of the features, which contradicts previous interpretations of Hughes' model.
On Overfitting Avoidance as Bias
TLDR
A formal analysis of contentions of Schaffer (1993) proves that his contentions are valid, although some of his experiments must be interpreted with caution.
Improving Performance in Neural Networks Using a Boosting Algorithm
TLDR
The effect of boosting is reported on four databases consisting of 12,000 digits from segmented ZIP codes from the United State Postal Service and the following from the National Institute of Standards and Testing (NIST).
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
  • J. Bridle
  • Computer Science
    NATO Neurocomputing
  • 1989
TLDR
Two modifications are explained: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non- linearity of feed-forward non-linear networks with multiple outputs.
Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction
TLDR
The results of the experiments indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees.
The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework
TLDR
A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem and on the Bayesian " Occam factors " argument for Occam's razor.
Local Algorithms for Pattern Recognition and Dependencies Estimation
TLDR
The theoretical framework on which local learning algorithms, which result in performance improvements for real problems, are based are presented, and a new statement of certain learning problems, namely the local risk minimization is presented.
...
...