The weighted majority algorithm

@article{Littlestone1989TheWM,
  title={The weighted majority algorithm},
  author={Nick Littlestone and Manfred K. Warmuth},
  journal={30th Annual Symposium on Foundations of Computer Science},
  year={1989},
  pages={256-261}
}
The construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes is studied. It is assumed that the learner has reason to believe that one of some pool of known algorithms will perform well but does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. It is called the… 

Memory bounds for the experts problem

The study of the learning with expert advice problem in the streaming setting is initiated, and lower and upper bounds are shown that the upper bounds show new ways to run standard sequential prediction algorithms in rounds on small “pools” of experts, thus reducing the necessary memory.

How to use expert advice

This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts', and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context.

Cascading randomized weighted majority: A new online ensemble learning algorithm

This paper proposes a cascading version of RWM to achieve not only better experimental results but also a better error bound for sufficiently large datasets.

Theory and applications of predictors that specialize

It is shown how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption, and how to derive corresponding loss bounds.

Rational Ordering and Clustering of Examples for Incremental Learning

This paper suggests a method that groups these examples into clusters interleaved by recalculations of weights of the classifiers from the ensemble so that the sum of differences in opinions are as equal as possible.

Evaluation methods and strategies for the interactive use of classifiers

Why averaging classifiers can protect against overfitting

A simple learning algorithm for binary classification is studied that predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error, and it is shown that the prediction is much more stable than the prediction of an algorithm that predictswith the best hypothesis.

Generalization bounds for averaged classifiers

This paper studies a simple learning algorithm for binary classification that predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error, and shows that the prediction is much more stable than the prediction of an algorithm that predicting with the best hypothesis.

Using and combining predictors that specialize

It is shown how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption, and how to derive corresponding loss bounds.

Analysis of a Pseudo-bayesian Prediction Method

| We study a simple learning algorithm for binary classiication. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all
...

References

SHOWING 1-10 OF 21 REFERENCES

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

  • N. Littlestone
  • Computer Science
    28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
  • 1987
This work presents one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions.

Learning probabilistic prediction functions

The question of how to learn rules, when those rules make probabilistic statements about the future, is considered and two results related to these distinct goals are given.

A statistical approach to learning and generalization in layered neural networks

The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived.

From on-line to batch learning

Learnability and the Vapnik-Chervonenkis dimension

This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

Probabilistic inductive inference

It is shown that any class of functions that can be inference from examples with probability exceeding 1/2 can be inferred deterministically, and that for probabilities p there is a discrete hierarchy of inferability parameterized by p.

Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities

This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July