Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

  title={Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks},
  author={Naresh Manwani and Mudit Agarwal},
—In this paper, we present online algorithm called Delaytron for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays { d t } Tt =1 is unknown to the algorithm. At the t -th round, the algorithm observes an example x t and predicts a label ˜ y t and receives the bandit feedback I [˜ y t = y t ] only d t rounds later. When t + d t > T , we consider that the feedback for the t -th round is missing. We show that the proposed algorithm achieves regret of O… 

Figures from this paper


Multiclass Classification using dilute bandit feedback
An algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial, which achieves O(T 1− 1 m+2 ) mistake bound if candidate label set size is m.
Efficient bandit algorithms for online multiclass prediction
The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label).
Online EXP3 Learning in Adversarial Bandits with Delayed Feedback
A two player zero-sum game where players experience asynchronous delays is considered and it is shown that even when the delays are large enough such that players no longer enjoy the “no-regret property”, the ergodic average of the strategy profile still converges to the set of Nash equilibria of the game.
Exact Passive-Aggressive Algorithms for Multiclass Classification Using Bandit Feedbacks
This paper proposes exact passive-aggressive online algorithms for multiclass classification under bandit feedback (EPABF) using an exploration-exploitation strategy to guess the class label in every trial and proposes three different variants of the weight update rule, which vary based on the aggressiveness to correct the mistake.
Online Learning with Adversarial Delays
It is shown that online-gradient-descent and follow-the-perturbed-leader achieve regret O(√D) in the delayed setting, where D is the sum of delays of each round's feedback.
Efficient Online Bandit Multiclass Learning with Õ(√T) Regret
An efficient second-order algorithm with Õ( 1 η √ T ) regret for the bandit online multiclass problem that provides a solution to the open problem of multiclass prediction in COLT.
Learning Multiclass Classifier Under Noisy Bandit Feedback
This paper proposes a novel approach to deal with noisy bandit feedback, based on the unbiased estimator technique, that can efficiently estimate the noise rates, and thus providing an end-to-end framework.
Practical Lessons from Predicting Clicks on Ads at Facebook
This paper introduces a model which combines decision trees with logistic regression, outperforming either of these methods on its own by over 3%, an improvement with significant impact to the overall system performance.
Foundations of Machine Learning
This graduate-level textbook introduces fundamental concepts and methods in machine learning, and provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application.
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.