Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

Computer Science, Mathematics

2021

The adversarial multi-armed bandit problem is studied and a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses is created that is applicable for a wide variety of problem scenarios.

Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback

Computer Science, Mathematics

2022

This work studies the problem of expert advice under partial bandit feedback setting and creates a sequential minimax optimal algorithm, whose performance is analyzed with regret against a general expert selection sequence.

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

Computer Science, Mathematics

2023

We study the adversarial online learning problem and create a completely online algorithmic framework that has data dependent regret guarantees in both full expert feedback and bandit feedback…

Minimax Optimal Online Stochastic Learning for Sequences of Convex Functions under Sub-Gradient Observation Failures

Hakan GökcesuS. Kozat

Computer Science, Mathematics

2019

This work proposes algorithms based on sub-gradient descent method, which achieve tight minimax optimal regret bounds, and proposes a blind algorithm that estimates properties of the underlying stochastic settings empirically in a generally applicable manner.

Efficient, Anytime Algorithms for Calibration with Isotonic Regression under Strictly Convex Losses

Mathematics, Computer Science

2021

This work studies the traditional square error setting with its weighted variant and shows that the optimal monotone transform is in the form of a unique staircase function, and proposes a linear time and space algorithm that can find such optimal transforms for specific loss settings.

Online Network Source Optimization with Graph-Kernel MAB

L. ToniP. Frossard

Computer Science, Mathematics

ECML/PKDD

2023

Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one, and further highlight the gain of the proposedOnline learning strategy in terms of cumulative regret, sample efficiency and computational complexity.

Low Regret Binary Sampling Method for Efficient Global Optimization of Univariate Functions

Mathematics, Computer Science

2022

This work proposes a computationally efficient algorithm for the problem of global optimization in univariate loss functions and analytically extends its results for a broader class of functions that covers more complex regularity conditions.

IEEE Transactions on Neural Networks and Learning…

An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward

Shangdong YangYang Gao

Computer Science, Mathematics

2021

This brief formalizes the stochastic MAB problem by knowing the NoMR that is in between the suboptimal mean reward and the optimal mean reward, and proposes a novel algorithm, NoMR-BANDIT, which is an optimal algorithm of this problem.

Efficient Minimax Optimal Global Optimization of Lipschitz Continuous Multivariate Functions

Computer Science, Mathematics

2022

The algorithm achieves an average regret bound of O(L\sqrt{n}T^{-\frac{1}{n}}) for the optimization of an n-dimensional Lipschitz continuous objective in a time horizon T, which is shown to be minimax optimal.

Near-Linear Time Algorithm with Near-Logarithmic Regret Per Switch for Mixable/Exp-Concave Losses

Computer Science, Mathematics

2021

It is shown that with the suitable selection of hyperexpert creations and weighting strategies, it is also possible to achieve nearlogarithmic regret per switch with sub-polynomial complexity per time.

Vasilis SyrgkanisA. KrishnamurthyR. Schapire

Efficient Algorithms for Adversarial Contextual Learning

Computer Science, Mathematics

International Conference on Machine Learning

2016

The first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem, and several extensions and implications of the algorithms, such as switching regret and efficient learning with predictable sequences are provided.

Alekh AgarwalDaniel J. HsuSatyen KaleJ. LangfordLihong LiR. Schapire

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Computer Science

International Conference on Machine Learning

2014

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that…

Miroslav DudíkDaniel J. Hsu Tong Zhang

Efficient Optimal Learning for Contextual Bandits

Computer Science, Mathematics

Conference on Uncertainty in Artificial…

2011

This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose.

Alekh AgarwalMiroslav DudíkSatyen KaleJ. LangfordR. Schapire

Contextual Bandit Learning with Predictable Rewards

Computer Science, Mathematics

International Conference on Artificial…

2012

A new lower bound is proved showing no algorithm can achieve superior performance in the worst case even with the realizability assumption, and it is shown that for any set of policies, there is a distribution over rewards such that the new algorithm has constant regret unlike the previous approaches.

International Conference on Artificial…

Contextual Multi-Armed Bandits

Tyler LuD. PálMartin Pál

Computer Science, Mathematics

2010

A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces.

The Nonstochastic Multiarmed Bandit Problem

P. AuerN. Cesa-BianchiY. FreundR. Schapire

Mathematics

SIAM journal on computing (Print)

2002

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.

Universal Piecewise Linear Prediction Via Context Trees

S. KozatA. SingerGeorg Zeitler

Computer Science

IEEE Transactions on Signal Processing

2007

This paper uses the use of a "context tree" to achieve the total squared prediction error performance of the best piecewise linear model that can choose both its partitioning of the regressor space and its real-valued prediction parameters within each region of the partition.

Combinatorial Bandits

N. Cesa-BianchiG. Lugosi

Mathematics, Computer Science

Annual Conference Computational Learning Theory

2009

Online Learning Algorithms Can Converge Comparably Fast as Batch Learning

Junhong LinDing-Xuan Zhou

Computer Science, Mathematics

IEEE Transactions on Neural Networks and Learning…

2018

A sharp estimate for the expected values of norms of the learning sequence and a refined error decomposition for online learning algorithms in a reproducing kernel Hilbert space associated with convex loss functions are studied.

A contextual-bandit approach to personalized news article recommendation

Lihong LiWei ChuJ. LangfordR. Schapire

Computer Science

The Web Conference

2010

This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.