# Reinforcement Learning with Immediate Rewards and Linear Hypotheses

@article{Abe2003ReinforcementL, title={Reinforcement Learning with Immediate Rewards and Linear Hypotheses}, author={N. Abe and A. Biermann and Philip M. Long}, journal={Algorithmica}, year={2003}, volume={37}, pages={263-293} }

Abstract
We consider the design and analysis of algorithms that learn from the
consequences of their actions
with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and
a linear function, which is unknown a priori, (approximately)
relates a feature vector for each action/state pair to the (expected)
associated reward.
We focus on two cases, one in which a continuous-valued reward is
(approximately) given by applying the unknown linearâ€¦Â CONTINUE READING

#### Figures and Topics from this paper.

#### Figures

71 Citations

Orthogonal Projection in Linear Bandits

- Mathematics, Computer Science
- 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)
- 2019

Contextual Markov Decision Processes using Generalized Linear Models

- Computer Science, Mathematics
- ArXiv
- 2019

8

On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem

- Computer Science
- 2010 Ninth International Conference on Machine Learning and Applications
- 2010

15- PDF

Parametrized stochastic multi-armed bandits with binary rewards

- Mathematics, Computer Science
- Proceedings of the 2011 American Control Conference
- 2011

3- PDF

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

- Computer Science, Mathematics
- UAI
- 2020

#### References

SHOWING 1-10 OF 27 REFERENCES

Reinforcement Learning: An Introduction

- Computer Science
- IEEE Transactions on Neural Networks
- 2005

26,141- PDF

Associative Reinforcement Learning: Functions in k-DNF

- Mathematics, Computer Science
- Machine Learning
- 2004

51- PDF

Individual sequence predictionâ€”upper bounds and application for complexity

- Computer Science
- COLT '99
- 1999

3

Using Confidence Bounds for Exploitation-Exploration Trade-offs

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2002

994- PDF

Simple statistical gradient-following algorithms for connectionist reinforcement learning

- Computer Science
- Machine Learning
- 2004

3,055- PDF

Using upper confidence bounds for online learning

- Computer Science
- Proceedings 41st Annual Symposium on Foundations of Computer Science
- 2000

37

Worst-case quadratic loss bounds for prediction using linear functions and gradient descent

- Mathematics, Computer Science
- IEEE Trans. Neural Networks
- 1996

128- PDF