Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

@article{MohagheghNeyshabouri2019AsymptoticallyOC,
  title={Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures},
  author={Mohammadreza Mohaghegh Neyshabouri and Kaan Gokcesu and Hakan Gokcesu and Huseyin Ozkan and Suleyman Serdar Kozat},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2019},
  volume={30},
  pages={923-937},
  url={https://api.semanticscholar.org/CorpusID:51906671}
}
This work proposes an online algorithm for sequential learning in the contextual multiarmed bandit setting and provides significant performance improvements by introducing upper bounds (with respect to the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state of the art.

Figures from this paper

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

The adversarial multi-armed bandit problem is studied and a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses is created that is applicable for a wide variety of problem scenarios.

Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback

This work studies the problem of expert advice under partial bandit feedback setting and creates a sequential minimax optimal algorithm, whose performance is analyzed with regret against a general expert selection sequence.

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

We study the adversarial online learning problem and create a completely online algorithmic framework that has data dependent regret guarantees in both full expert feedback and bandit feedback…

Minimax Optimal Online Stochastic Learning for Sequences of Convex Functions under Sub-Gradient Observation Failures

This work proposes algorithms based on sub-gradient descent method, which achieve tight minimax optimal regret bounds, and proposes a blind algorithm that estimates properties of the underlying stochastic settings empirically in a generally applicable manner.

Efficient, Anytime Algorithms for Calibration with Isotonic Regression under Strictly Convex Losses

This work studies the traditional square error setting with its weighted variant and shows that the optimal monotone transform is in the form of a unique staircase function, and proposes a linear time and space algorithm that can find such optimal transforms for specific loss settings.

Online Network Source Optimization with Graph-Kernel MAB

Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one, and further highlight the gain of the proposedOnline learning strategy in terms of cumulative regret, sample efficiency and computational complexity.

Low Regret Binary Sampling Method for Efficient Global Optimization of Univariate Functions

This work proposes a computationally efficient algorithm for the problem of global optimization in univariate loss functions and analytically extends its results for a broader class of functions that covers more complex regularity conditions.

An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward

This brief formalizes the stochastic MAB problem by knowing the NoMR that is in between the suboptimal mean reward and the optimal mean reward, and proposes a novel algorithm, NoMR-BANDIT, which is an optimal algorithm of this problem.

Efficient Minimax Optimal Global Optimization of Lipschitz Continuous Multivariate Functions

The algorithm achieves an average regret bound of O(L\sqrt{n}T^{-\frac{1}{n}}) for the optimization of an n-dimensional Lipschitz continuous objective in a time horizon T, which is shown to be minimax optimal.

Near-Linear Time Algorithm with Near-Logarithmic Regret Per Switch for Mixable/Exp-Concave Losses

It is shown that with the suitable selection of hyperexpert creations and weighting strategies, it is also possible to achieve nearlogarithmic regret per switch with sub-polynomial complexity per time.

Efficient Algorithms for Adversarial Contextual Learning

The first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem, and several extensions and implications of the algorithms, such as switching regret and efficient learning with predictable sequences are provided.

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that…

Efficient Optimal Learning for Contextual Bandits

This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose.

Contextual Bandit Learning with Predictable Rewards

A new lower bound is proved showing no algorithm can achieve superior performance in the worst case even with the realizability assumption, and it is shown that for any set of policies, there is a distribution over rewards such that the new algorithm has constant regret unlike the previous approaches.

Contextual Multi-Armed Bandits

A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces.

The Nonstochastic Multiarmed Bandit Problem

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.

Universal Piecewise Linear Prediction Via Context Trees

This paper uses the use of a "context tree" to achieve the total squared prediction error performance of the best piecewise linear model that can choose both its partitioning of the regressor space and its real-valued prediction parameters within each region of the partition.

Online Learning Algorithms Can Converge Comparably Fast as Batch Learning

A sharp estimate for the expected values of norms of the learning sequence and a refined error decomposition for online learning algorithms in a reproducing kernel Hilbert space associated with convex loss functions are studied.

A contextual-bandit approach to personalized news article recommendation

This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.