The weighted majority algorithm
@article{Littlestone1989TheWM, title={The weighted majority algorithm}, author={Nick Littlestone and Manfred K. Warmuth}, journal={30th Annual Symposium on Foundations of Computer Science}, year={1989}, pages={256-261} }
The construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes is studied. It is assumed that the learner has reason to believe that one of some pool of known algorithms will perform well but does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. It is called the…
2,402 Citations
Memory bounds for the experts problem
- Computer ScienceSTOC
- 2022
The study of the learning with expert advice problem in the streaming setting is initiated, and lower and upper bounds are shown that the upper bounds show new ways to run standard sequential prediction algorithms in rounds on small “pools” of experts, thus reducing the necessary memory.
How to use expert advice
- Computer ScienceSTOC '93
- 1993
This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts', and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context.
Cascading randomized weighted majority: A new online ensemble learning algorithm
- Computer ScienceIntell. Data Anal.
- 2016
This paper proposes a cascading version of RWM to achieve not only better experimental results but also a better error bound for sufficiently large datasets.
Theory and applications of predictors that specialize
- Computer Science
- 2007
It is shown how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption, and how to derive corresponding loss bounds.
Rational Ordering and Clustering of Examples for Incremental Learning
- Computer Science
- 2000
This paper suggests a method that groups these examples into clusters interleaved by recalculations of weights of the classifiers from the ensemble so that the sum of differences in opinions are as equal as possible.
Evaluation methods and strategies for the interactive use of classifiers
- Computer ScienceInt. J. Hum. Comput. Stud.
- 2012
Why averaging classifiers can protect against overfitting
- Computer ScienceAISTATS
- 2001
A simple learning algorithm for binary classification is studied that predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error, and it is shown that the prediction is much more stable than the prediction of an algorithm that predictswith the best hypothesis.
Generalization bounds for averaged classifiers
- Computer Science
- 2004
This paper studies a simple learning algorithm for binary classification that predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error, and shows that the prediction is much more stable than the prediction of an algorithm that predicting with the best hypothesis.
Using and combining predictors that specialize
- Computer ScienceSTOC '97
- 1997
It is shown how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption, and how to derive corresponding loss bounds.
Analysis of a Pseudo-bayesian Prediction Method
- Computer Science
- 2000
| We study a simple learning algorithm for binary classiication. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all…
References
SHOWING 1-10 OF 21 REFERENCES
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
- Computer Science28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
- 1987
This work presents one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions.
Learning probabilistic prediction functions
- Computer Science[Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science
- 1988
The question of how to learn rules, when those rules make probabilistic statements about the future, is considered and two results related to these distinct goals are given.
On-line learning with an oblivious environment and the power of randomization
- Computer ScienceCOLT '91
- 1991
Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise
- Computer ScienceCOLT '91
- 1991
A statistical approach to learning and generalization in layered neural networks
- Computer ScienceCOLT '89
- 1989
The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived.
Probability and Plurality for Aggregations of Learning Machines
- Computer ScienceInf. Comput.
- 1987
Learnability and the Vapnik-Chervonenkis dimension
- Computer ScienceJACM
- 1989
This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Probabilistic inductive inference
- Computer ScienceJACM
- 1989
It is shown that any class of functions that can be inference from examples with probability exceeding 1/2 can be inferred deterministically, and that for probabilities p there is a discrete hierarchy of inferability parameterized by p.
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
- Mathematics
- 1971
This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July…