A decision-theoretic generalization of on-line learning and an application to boosting

@inproceedings{Freund1995ADG,
  title={A decision-theoretic generalization of on-line learning and an application to boosting},
  author={Yoav Freund and Robert E. Schapire},
  booktitle={EuroCOLT},
  year={1995}
}
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably… 
Potential-Based Algorithms in On-Line Prediction and Game Theory
TLDR
This paper shows that several known algorithms for sequential prediction problems, for playing iterated games, and for boosting are special cases of a general decision strategy based on the notion of potential, and describes a notion of generalized regret and its applications in learning theory.
Potential-Based Algorithms in On-Line Prediction and Game Theory ∗
TLDR
This paper shows that several known algorithms for sequential prediction problems, for playing iterated games, and for boosting are special cases of a general decision strategy based on the notion of potential, and describes a notion of generalized regret.
Recent Results in On-line Prediction and Boosting
TLDR
It is suggested that the on-line prediction model is a good source of interesting algorithmic ideas with a great potential for new applications and a simple algorithm based on “multiplicative weights” is described.
On-line Learning and the Metrical Task System Problem
TLDR
An experimental comparison of how these algorithms perform on a process migration problem, a problem that combines aspects of both the experts-tracking and MTS formalisms, is presented.
Special Invited Paper-Additive logistic regression: A statistical view of boosting
TLDR
This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.
Practical Algorithms for On-line Sampling
TLDR
This paper presents two on-line sampling algorithms for selecting a hypothesis, gives theoretical bounds on the number of examples needed, and analyses them experimentally to study the problem of how to determine which of the hypotheses in the class is almost the best one.
Practical Algorithms for On-line Sampling Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150 Introduction and Motivation 2
TLDR
This paper presents two on-line sampling algorithms for selecting hypotheses, gives theoretical bounds on the number of necessary examples, and analize them exprimentally.
Exploiting easy data in online optimization
TLDR
A general algorithm that, provided with a "safe" learning algorithm and an opportunistic "benchmark", can effectively combine good worst-case guarantees with much improved performance on "easy" data is introduced.
On-line evaluation and prediction using linear functions
TLDR
A model for situations where an algorithm needs to make a sequence of choices to minimize an evaluation function, but where the evaluation function must be learned on-line as it is being used, and proves performance bounds for them that hold in the worst case.
A Multiclass Extension To The Brownboost Algorithm
TLDR
This paper proposes a natural multiclass extension to the basic algorithm, incorporating error-correcting output codes and a multiclass gain measure, and test two-class and multiclass versions of the algorithm on a number of real and simulated data sets with artificial class noise, showing that Brownboost consistently outperforms Adaboost in these situations.
...
...

References

SHOWING 1-10 OF 46 REFERENCES
Boosting a weak learning algorithm by majority
TLDR
An algorithm for improving the accuracy of algorithms for learning binary concepts by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples, is presented.
Experiments with a New Boosting Algorithm
TLDR
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.
Boosting Decision Trees
TLDR
A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method.
How to use expert advice
TLDR
This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts', and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context.
An Experimental and Theoretical Comparison of Model Selection Methods
TLDR
A detailed comparison of three well-known model selection methods — a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV) are compared.
Data filtering and distribution modeling algorithms for machine learning
TLDR
This thesis is concerned with the analysis of algorithms for machine learning and describes and analyses an algorithm for improving the performance of a general concept learning algorithm by selecting those labeled instances that are most informative.
Tight worst-case loss bounds for predicting with expert advice
TLDR
This work considers on-line algorithms for predicting binary or continuous-valued outcomes, when the algorithm has available the predictions made by N experts, and shows that for a large class of loss functions, with binary outcomes the total loss of the algorithm proposed by Vovk exceeds the total losses of the best expert at most by the amount c ln N, where c is a constant determined by the loss function.
What Size Net Gives Valid Generalization?
TLDR
It is shown that if m O(W/ ∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 2 ∊ of future test examples drawn from the same distribution.
The weighted majority algorithm
TLDR
A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in a situation in which a learner faces a sequence of trials, and the goal of the learner is to make few mistakes.
Solving Multiclass Learning Problems via Error-Correcting Output Codes
TLDR
It is demonstrated that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
...
...