Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures
@article{MohagheghNeyshabouri2019AsymptoticallyOC, title={Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures}, author={Mohammadreza Mohaghegh Neyshabouri and Kaan Gokcesu and Hakan Gokcesu and Huseyin Ozkan and Suleyman Serdar Kozat}, journal={IEEE Transactions on Neural Networks and Learning Systems}, year={2019}, volume={30}, pages={923-937}, url={https://api.semanticscholar.org/CorpusID:51906671} }
This work proposes an online algorithm for sequential learning in the contextual multiarmed bandit setting and provides significant performance improvements by introducing upper bounds (with respect to the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state of the art.
Topics
State Of The Art (opens in a new tab)Computational Complexity (opens in a new tab)Context Space (opens in a new tab)Adversarial Environments (opens in a new tab)Contextual Bandit Algorithms (opens in a new tab)Partitions (opens in a new tab)Hierarchical Structure (opens in a new tab)Asymptotically Optimal (opens in a new tab)Online Algorithms (opens in a new tab)Synthetic Data (opens in a new tab)
23 Citations
Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits
- 2021
Computer Science, Mathematics
The adversarial multi-armed bandit problem is studied and a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses is created that is applicable for a wide variety of problem scenarios.
Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback
- 2022
Computer Science, Mathematics
This work studies the problem of expert advice under partial bandit feedback setting and creates a sequential minimax optimal algorithm, whose performance is analyzed with regret against a general expert selection sequence.
Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback
- 2023
Computer Science, Mathematics
We study the adversarial online learning problem and create a completely online algorithmic framework that has data dependent regret guarantees in both full expert feedback and bandit feedback…
Minimax Optimal Online Stochastic Learning for Sequences of Convex Functions under Sub-Gradient Observation Failures
- 2019
Computer Science, Mathematics
This work proposes algorithms based on sub-gradient descent method, which achieve tight minimax optimal regret bounds, and proposes a blind algorithm that estimates properties of the underlying stochastic settings empirically in a generally applicable manner.
Efficient, Anytime Algorithms for Calibration with Isotonic Regression under Strictly Convex Losses
- 2021
Mathematics, Computer Science
This work studies the traditional square error setting with its weighted variant and shows that the optimal monotone transform is in the form of a unique staircase function, and proposes a linear time and space algorithm that can find such optimal transforms for specific loss settings.
Online Network Source Optimization with Graph-Kernel MAB
- 2023
Computer Science, Mathematics
Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one, and further highlight the gain of the proposedOnline learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
Low Regret Binary Sampling Method for Efficient Global Optimization of Univariate Functions
- 2022
Mathematics, Computer Science
This work proposes a computationally efficient algorithm for the problem of global optimization in univariate loss functions and analytically extends its results for a broader class of functions that covers more complex regularity conditions.
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward
- 2021
Computer Science, Mathematics
This brief formalizes the stochastic MAB problem by knowing the NoMR that is in between the suboptimal mean reward and the optimal mean reward, and proposes a novel algorithm, NoMR-BANDIT, which is an optimal algorithm of this problem.
Efficient Minimax Optimal Global Optimization of Lipschitz Continuous Multivariate Functions
- 2022
Computer Science, Mathematics
The algorithm achieves an average regret bound of O(L\sqrt{n}T^{-\frac{1}{n}}) for the optimization of an n-dimensional Lipschitz continuous objective in a time horizon T, which is shown to be minimax optimal.
Near-Linear Time Algorithm with Near-Logarithmic Regret Per Switch for Mixable/Exp-Concave Losses
- 2021
Computer Science, Mathematics
It is shown that with the suitable selection of hyperexpert creations and weighting strategies, it is also possible to achieve nearlogarithmic regret per switch with sub-polynomial complexity per time.
46 References
Efficient Algorithms for Adversarial Contextual Learning
- 2016
Computer Science, Mathematics
The first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem, and several extensions and implications of the algorithms, such as switching regret and efficient learning with predictable sequences are provided.
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
- 2014
Computer Science
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that…
Efficient Optimal Learning for Contextual Bandits
- 2011
Computer Science, Mathematics
This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose.
Contextual Bandit Learning with Predictable Rewards
- 2012
Computer Science, Mathematics
A new lower bound is proved showing no algorithm can achieve superior performance in the worst case even with the realizability assumption, and it is shown that for any set of policies, there is a distribution over rewards such that the new algorithm has constant regret unlike the previous approaches.
Contextual Multi-Armed Bandits
- 2010
Computer Science, Mathematics
A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces.
The Nonstochastic Multiarmed Bandit Problem
- 2002
Mathematics
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.
Universal Piecewise Linear Prediction Via Context Trees
- 2007
Computer Science
This paper uses the use of a "context tree" to achieve the total squared prediction error performance of the best piecewise linear model that can choose both its partitioning of the regressor space and its real-valued prediction parameters within each region of the partition.
Combinatorial Bandits
- 2009
Mathematics, Computer Science
Online Learning Algorithms Can Converge Comparably Fast as Batch Learning
- 2018
Computer Science, Mathematics
A sharp estimate for the expected values of norms of the learning sequence and a refined error decomposition for online learning algorithms in a reproducing kernel Hilbert space associated with convex loss functions are studied.
A contextual-bandit approach to personalized news article recommendation
- 2010
Computer Science
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.