Prediction Games and Arcing Algorithms

  title={Prediction Games and Arcing Algorithms},
  author={Leo Breiman},
  journal={Neural Computation},
  • L. Breiman
  • Published 1 October 1999
  • Mathematics, Computer Science, Medicine
  • Neural Computation
The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund & Schapire, 1996a, 1997) and others in reducing generalization error has not been well understood. By formulating prediction as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. The minimax… Expand
On the Convergence Properties of Optimal AdaBoost
This paper establishes the convergence of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004, and proves the convergence, with the number of rounds, of the classifier itself, its generalization error, and its resulting margins for fixed data sets, under certain reasonable conditions. Expand
Axiomatic Characterization of AdaBoost and the Multiplicative Weight Update Procedure
It is proved that any method that satisfies three natural axioms on adaptive re-weighting and combining algorithms must be minimizing the composition of an exponential loss with an additive function, and that the weights must be updated according to the multiplicative weight update procedure. Expand
The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins
This work reduces AdaBoost to a nonlinear iterated map and studies the evolution of its weight vectors to understand AdaBoost's convergence properties completely, and shows that AdaBoost does not always converge to a maximum margin combined classifier, answering an open question. Expand
Deriving and Analyzing Learning Algorithms
Project Summary There is a large variety of learning problems across all disciplines waiting for the right algorithms. Many of these are on-line problems, where the learning algorithm continuallyExpand
Additive Logistic Regression : a Statistical
This work develops more direct approximations of boosting that exhibit performance comparable to other recently proposed multi-class generalizations of boosting, and suggests a minor modiication to boosting that can reduce computation, often by factors of 10 to 50. Expand
Additive Logistic Regression : a Statistical View ofBoostingJerome
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classiication methodology. The performance of many classiication algorithms often can beExpand
On the Dynamics of Boosting
By considering AdaBoost as a dynamical system, this work is able to prove Ratsch and Warmuth's conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a 'non-optimal' weak learning algorithm. Expand
Improving Policy Functions in High-Dimensional Dynamic Games
The approach combines ideas from literatures in Machine Learning and the econometric analysis of games to derive a one-step improvement policy over any given benchmark policy in high-dimensional Markov dynamic optimization problems, focusing in particular on dynamic games. Expand
The Rate of Convergence of Adaboost
The rate at which AdaBoost iteratively converges to the minimum of the "exponential loss" is studied to show that this dependence of the rate on e is optimal up to constant factors, that is, at least Ω(1/e) rounds are necessary to achieve within e of the optimal exponential loss. Expand
Greedy Fun tion Approximation : A Gradient Boosting
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions andExpand


Arcing the edge
Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decreaseExpand
A decision-theoretic generalization of on-line learning and an application to boosting
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
Bias, Variance , And Arcing Classifiers
This work explores two arcing algorithms, compares them to each other and to bagging, and tries to understand how arcing works, which is more sucessful than bagging in variance reduction. Expand
Game theory, on-line prediction and boosting
An algorithm for learning to play repeated games based on the on-line prediction methods of Littlestone and Warmuth is described, which yields a simple proof of von Neumann’s famous minmax theorem, as well as a provable method of approximately solving a game. Expand
Self bounding learning algorithms
  • Y. Freund
  • Mathematics, Computer Science
  • COLT' 98
  • 1998
A self-bounding learning algorithm is an algorithm which, in addition to the hypothesis that it outputs, outputs a reliable upper bound on the generalization error of this hypothesis. Expand
Boosting Decision Trees
A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method. Expand
Experiments with a New Boosting Algorithm
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand
Generalization in Decision Trees and DNF: Does Size Matter?
This paper shows that with high probability any decision tree of depth no more than d that is consistent with m training examples has misclassification probabilityNo more than O((1/m(Neff VCdim(U) log2 m log d))1/2), where U is the class of node decision functions, and Neff ≤ N can be thought of as the effective number of leaves. Expand
Combinations of Weak Classifiers
The method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications and when the strength of strong classifiers is properly chosen, combinations ofWeak classifiers can achieve a good generalized performance with polynomial space- and time-complexity. Expand
Bagging, Boosting, and C4.5
Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit. Expand