The Existence of A Priori Distinctions Between Learning Algorithms

@article{Wolpert1996TheEO,
  title={The Existence of A Priori Distinctions Between Learning Algorithms},
  author={D. Wolpert},
  journal={Neural Computation},
  year={1996},
  volume={8},
  pages={1391-1420}
}
  • D. Wolpert
  • Published 1996
  • Computer Science
  • Neural Computation
This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori… Expand
Any Two Learning Algorithms Are (Almost) Exactly Identical
This paper shows that if one is provided with a loss function, it can be used in a natural way to specify a distance measure quantifying the similarity of any two supervised learning algorithms, evenExpand
Sample complexity of classification with compressed input
TLDR
It is proven that the sample complexity to achieve an ∊ - δ Probably Approximately Correct (PAC) hypothesis is bounded by 2 cH ( X ) / ∊ + log 1 δ a ∊ 2 which is sharp up to the 1∊ 2 factor (a and c are constants). Expand
What is important about the No Free Lunch theorems?
TLDR
The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally, and motivate a ``dictionary'' between supervised learning and improve blackbox optimization, which allows one to ``translate'' techniques from supervised learning into the domain of black box optimization, thereby strengthening blackbox optimized algorithms. Expand
SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise
TLDR
This paper introduces the "Sigmoid Rule" Framework, a framework focusing on the description of classifier behavior in noisy settings, and shows that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset affect learning. Expand
On classifier behavior in the presence of mislabeling noise
TLDR
The “sigmoid rule” framework is presented, which can be used to choose the most appropriate learning algorithm depending on the properties of noise in a classification problem, and is applicable to concept drift scenarios, including modeling user behavior over time, and mining of noisy time series of evolving nature. Expand
There is No Free Lunch but the Starter is Cheap : Generalisation from First Principles
According to Wolpert's no-free-lunch (NFL) theorems [Wolpert, 1996b, Wolpert, 1996a], generalisation in the absence of domain knowledge is necessarily a zero-sum enterprise. Good generalisationExpand
Draft Copy { Please Do Not Distribute Prior Information and Generalized Questions
In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). This paper 1.)Expand
Some observations concerning Off Training Set (OTS) error
TLDR
It is shown that the applicability of the theorem showing that small training set error does not guarantee small OTS error is limited to models in which the distribution generating training data has no overlap with the distribution distributing test data. Expand
Automatic bias learning: an inquiry into the inductive basis of induction
TLDR
A new position in the scienti c realism debate, transcendental surrealism, is proposed and defended on the picture of induction that emerges in the thesis, which aims to investigate how inductive performance could be improved by using induction to select appropriate generalisation procedures. Expand
Learning to Recognize Faces by Successive Meetings
TLDR
The results seem to indicate that more interaction or meetings with the different individuals are needed to affirm that their identity is familiar enough to be recognized robustly, and if a verification stage is included the system learns fast to detect new identities. Expand
...
1
2
3
4
5
...

References

SHOWING 1-5 OF 5 REFERENCES
On the Connection between In-sample Testing and Generalization Error
  • D. Wolpert
  • Mathematics, Computer Science
  • Complex Syst.
  • 1992
TLDR
It is impossible to justify a corre latio n between rep roducti on of a training set and generali zation err or off of the training set using only a pr iori reasoning, and a novel formalism for address ing mac hine learni ng is developed. Expand
Off-training-set error for the Gibbs and the Bayes optimal generalizers
In this paper we analyze the average off-training-set behavior of the Bayes-optimal and Gibbs learning algorithms. We do this by exploiting the concept of refinement, which concerns the relationshipExpand
On Bias Plus Variance
  • D. Wolpert
  • Mathematics, Computer Science
  • Neural Computation
  • 1997
This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as inExpand
The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework
TLDR
A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem and on the Bayesian " Occam factors " argument for Occam's razor. Expand
Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization
A general theoretical framework for Monte Carlo averaging methods of improving regression estimates is presented with application to neural network classification and time series prediction. Given aExpand