Reinforcement Learning with Immediate Rewards and Linear Hypotheses

@article{Abe2003ReinforcementL,
  title={Reinforcement Learning 
with Immediate Rewards 
and Linear Hypotheses},
  author={Naoki Abe and Alan W. Biermann and Philip M. Long},
  journal={Algorithmica},
  year={2003},
  volume={37},
  pages={263-293},
  url={https://api.semanticscholar.org/CorpusID:13804406}
}
For two cases, one in which a continuous-valued reward is given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained, lower bounds are provided that show that the rate of convergence is nearly optimal.

No-regret Exploration in Contextual Reinforcement Learning

This paper proposes and analyzes optimistic and randomized exploration methods which make (time and space) efficient online updates and demonstrates a generic template to derive confidence sets using an online learning oracle and gives a lower bound for the setting.

Orthogonal Projection in Linear Bandits

This paper considers the case where the expected reward is an unknown linear function of a projection of the decision vector onto a subspace orthogonal to the first, and develops a strategy to achieve O(log T ) regret, where T is the number of time steps.

Contextual Markov Decision Processes using Generalized Linear Models

This paper proposes a no-regret online RL algorithm in the setting where the MDP parameters are obtained from the context using generalized linear models (GLMs) and relies on efficient online updates and is also memory efficient.

Efficient Value-Function Approximation via Online Linear Regression

A provably efficient, model-free RL algorithm for finite-horizon problems with linear value-function approximation that addresses the exploration-exploitation tradeoff in a principled way.

A unifying framework for computational reinforcement learning theory

This thesis is that the KWIK learning model provides a flexible, modularized, and unifying way for creating and analyzing reinforcement-learning algorithms with provably efficient exploration and facilitates the development of new algorithms with smaller sample complexity, which have demonstrated empirically faster learning speed in real-world problems.

On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem

This paper introduces a novel algorithm, e-ADAPT, which adapts as it plays and sequentially chooses whether to explore or exploit, driven by the amount of uncertainty in the system.

Parametrized stochastic multi-armed bandits with binary rewards

An upper bound on the total regret which applies uniformly in time is shown, which shows that for any f ∈ ω(log(T), thetotal regret can be made to be O(n·f(T)), independent of the number of arms.

Neural Contextual Bandits with UCB-based Exploration

A new algorithm, NeuralUCB, is proposed, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration.

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Two perturbation approaches are investigated to overcome conservatism that optimism based algorithms chronically suffer from in practice and both empirically show the outstanding performance in tackling conservatism issue that Discounted LinUCB (D-LinUCB) struggles with.

UPPER CONFIDENCE BOUND-BASED EXPLORATION

    Computer Science
  • 2019
The NeuralUCB algorithm is proposed, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration.
...

Associative Reinforcement Learning using Linear Probabilistic Concepts

The analysis shows that the worst-case (expected) regret for the methods is almost optimal: the upper bounds grow with the number m of trials and the number n of alternatives like O(m 3=4 n 1=2 ) and O( m 4=5 n 2=5 ), and the lower bound is.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Associative Reinforcement Learning: A Generate and Test Algorithm

An algorithm is developed that performs an on-line search through the space of action mappings, expressed as Boolean formulae, that is shown to have very good performance in empirical trials.

Associative Reinforcement Learning: Functions in k-DNF

Algorithms that can efficiently learn action maps that are expressible in k-DNF are developed and are shown to have very good performance.

On-line evaluation and prediction using linear functions

A model for situations where an algorithm needs to make a sequence of choices to minimize an evaluation function, but where the evaluation function must be learned on-line as it is being used, and proves performance bounds for them that hold in the worst case.

Individual sequence prediction—upper bounds and application for complexity

This work presents the first upper bound on the regret of the loss game that is a function of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage.

Using Confidence Bounds for Exploitation-Exploration Trade-offs

It is shown how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off, and improves the regret from O(T3/4) to T1/2.

Using upper confidence bounds for online learning

    P. Auer
    Computer Science, Mathematics
  • 2000
It is shown how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation/exploration trade-off and extends the results for the adversarial bandit problem to shifting bandits.

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

Worst-case quadratic loss bounds for prediction using linear functions and gradient descent

Studies the performance of gradient descent (GD) when applied to the problem of online linear prediction in arbitrary inner product spaces. We prove worst-case bounds on the sum of the squared…