Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
@article{Feng2022SharpBO, title={Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning}, author={Raymond Feng and Jesse T. Geneson and Andrew Lee and Espen Slettnes}, journal={ArXiv}, year={2022}, volume={abs/2209.01366} }
We determine sharp bounds on the price of bandit feedback for several variants of the mistake-bound model. The first part of the paper presents bounds on the r -input weak reinforcement model and the r -input delayed, ambiguous reinforcement model. In both models, the adversary gives r inputs in each round and only indicates a correct answer if all r guesses are correct. The only difference between the two models is that in the delayed, ambiguous model, the learner must answer each input before…
References
SHOWING 1-10 OF 17 REFERENCES
Results on Various Models of Mistake-Bounded Online Learning
- Computer Science
- 2021
A lower and upper bound of the maximum factor gap that are tight up to a factor of r between the modified weak reinforcement model and the standard model and several related models for learning with permutation patterns are introduced.
A note on the price of bandit feedback for mistake-bounded online learning
- MathematicsTheor. Comput. Sci.
- 2021
New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
- Computer ScienceALT
- 2017
Structural Results About On-line Learning Models With and Without Queries
- Computer ScienceMachine Learning
- 2004
We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of…
Multiclass classification with bandit feedback using adaptive regularization
- Computer ScienceMachine Learning
- 2012
We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the…
Stochastic Linear Optimization under Bandit Feedback
- Computer Science, MathematicsCOLT
- 2008
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
- Computer ScienceNIPS
- 2011
It is proved that the regret of NEWTRON is O(log T) when α is a constant that does not vary with horizon T, and at most O(T2/3) if α is allowed to increase to infinity with T.
On the complexity of function learning
- Computer Science, MathematicsCOLT '93
- 1993
The notion of a binary branching adversary tree for function learning is introduced, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not involve any “learning” model).
The price of bandit information in multiclass online classification
- Computer ScienceCOLT
- 2013
The results are tight up to a logarithmic factor and essentially answer an open question from (Daniely et. al. - Multiclass learnability and the erm principle).
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
- Computer Science28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
- 1987
This work presents one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions.