• Corpus ID: 238419214

Efficient Methods for Online Multiclass Logistic Regression

  title={Efficient Methods for Online Multiclass Logistic Regression},
  author={Naman Agarwal and Satyen Kale and Julian Zimmert},
Multiclass logistic regression is a fundamental task in machine learning with applications in classification and boosting. Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving “fast rates” in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class. While Foster et al. (2018) introduced a statistically optimal algorithm, it is… 

Tables from this paper

Scale-free Unconstrained Online Learning for Curved Losses

This work shows that there is in fact never a price to pay for adaptivity if the authors specialise to any of the other common supervised online learning losses, and provides an adaptive method for linear logistic regression that is as efficient as the recent non-adaptive algorithm by Agarwal et al. (2021).

Quasi-Newton Steps for Efficient Online Exp-Concave Optimization

This paper side-step generalized projections by using a self-concordant barrier as a regularizer to compute the Newton steps, ensuring that the iterates are always within the feasible set without requiring projections.



Logistic Regression: The Importance of Being Improper

This work designs a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm and shows that the improved dependence on predictor norm is near-optimal.

Mixability made efficient: Fast online multiclass logistic regression

This paper uses quadratic surrogates to make aggregating forecasters morecient and derives an algorithm for multi-class logistic regression with a regret bounded by O ( B log( n )) and a computational complexity of only O ( n 4 ) .

Efficient improper learning for online logistic regression

An efficient improper algorithm is designed that avoids an exponential multiplicative constant while preserving a logarithmic regret and satisfies a regret scaling as O(B log(Bn) with a per-round time-complexity of order O(d^2).

Online multiclass boosting

This work defines, and justifies, a weak learning condition for online multiclass boosting that leads to an optimal boosting algorithm that requires the minimal number of weak learners to achieve a certain accuracy.

On the generalization ability of on-line learning algorithms

This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.

Exploiting the Surrogate Gap in Online Multiclass Classification

Gaptron is a randomized first-order algorithm for online multiclass classification that exploits the gap between the zero-one loss and surrogate losses rather than exploiting properties such as exp-concavity or mixability, which are traditionally used to prove logarithmic or constant regret bounds.

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

It is proved that the regret of NEWTRON is O(log T) when α is a constant that does not vary with horizon T, and at most O(T2/3) if α is allowed to increase to infinity with T.

Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

This work provides a bias-variance decomposition and shows that the assumptions commonly made in least-squares regression can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.

Efficient bandit algorithms for online multiclass prediction

The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label).

A decision-theoretic generalization of on-line learning and an application to boosting

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.