Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning

@article{Feng2022SharpBO,
  title={Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning},
  author={Raymond Feng and Jesse T. Geneson and Andrew Lee and Espen Slettnes},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.01366}
}
We determine sharp bounds on the price of bandit feedback for several variants of the mistake-bound model. The first part of the paper presents bounds on the r -input weak reinforcement model and the r -input delayed, ambiguous reinforcement model. In both models, the adversary gives r inputs in each round and only indicates a correct answer if all r guesses are correct. The only difference between the two models is that in the delayed, ambiguous model, the learner must answer each input before… 

References

SHOWING 1-10 OF 17 REFERENCES

Results on Various Models of Mistake-Bounded Online Learning

A lower and upper bound of the maximum factor gap that are tight up to a factor of r between the modified weak reinforcement model and the standard model and several related models for learning with permutation patterns are introduced.

Structural Results About On-line Learning Models With and Without Queries

We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of

Multiclass classification with bandit feedback using adaptive regularization

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the

Stochastic Linear Optimization under Bandit Feedback

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

It is proved that the regret of NEWTRON is O(log T) when α is a constant that does not vary with horizon T, and at most O(T2/3) if α is allowed to increase to infinity with T.

On the complexity of function learning

The notion of a binary branching adversary tree for function learning is introduced, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not involve any “learning” model).

The price of bandit information in multiclass online classification

The results are tight up to a logarithmic factor and essentially answer an open question from (Daniely et. al. - Multiclass learnability and the erm principle).

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

  • N. Littlestone
  • Computer Science
    28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
  • 1987
This work presents one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions.