Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak… (More)

We describe a new, simplified, and general analysis of a fusion of Nesterov’s accelerated gradient with parallel coordinate descent. The resulting algorithm, which we call BOOM, for boosting with… (More)

In this first lecture, we begin by introducing the Chinese Restaurant Process. After a brief review of finite mixture models, we describe the Chinese Restaurant Process mixture, where the latent… (More)

The AdaBoost algorithm was designed to combine many “weak” hypotheses that perform slightly better than random guessing into a “strong” hypothesis that has very low error. We study the rate at which… (More)

Hierarchical probabilistic modeling of discrete data has emerged as a powerful tool for text analysis. Posterior inference in such models is intractable, and practitioners rely on approximate… (More)

We consider the problem of learning to predict as well as the best in a group of experts making continuous predictions. We assume the learning algorithm has prior knowledge of the maximum number of… (More)

and where the first inequality follows from the definition (2) of the weak-learning condition. Let λ∗ be a minimizer of the min-max expression. Unless the first entry of each-row of (Hλ∗ −B) is the… (More)

We consider the problem of learning to predict as well as the best in a group of experts making continuous predictions. We assume the learning algorithm has prior knowledge of the maximum number of… (More)