Reinforcement Learning with Immediate Rewards and Linear Hypotheses

  title={Reinforcement Learning 
with Immediate Rewards 
and Linear Hypotheses},
  author={N. Abe and A. Biermann and Philip M. Long},
  • N. Abe, A. Biermann, Philip M. Long
  • Published 2003
  • Mathematics, Computer Science
  • Algorithmica
  • Abstract We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear… CONTINUE READING
    71 Citations
    No-regret Exploration in Contextual Reinforcement Learning
    • 5
    • PDF
    Orthogonal Projection in Linear Bandits
    • Q. Kang, W. Tay
    • Mathematics, Computer Science
    • 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)
    • 2019
    Contextual Markov Decision Processes using Generalized Linear Models
    • 8
    A unifying framework for computational reinforcement learning theory
    • 69
    • PDF
    On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem
    • 15
    • PDF
    Parametrized stochastic multi-armed bandits with binary rewards
    • Chong Jiang, R. Srikant
    • Mathematics, Computer Science
    • Proceedings of the 2011 American Control Conference
    • 2011
    • 3
    • PDF
    Neural Contextual Bandits with UCB-based Exploration.
    • 4
    • PDF
    Randomized Exploration for Non-Stationary Stochastic Linear Bandits
    • 2019


    Associative Reinforcement Learning using Linear Probabilistic Concepts
    • 48
    Reinforcement Learning: An Introduction
    • 26,141
    • PDF
    Associative reinforcement learning: A generate and test algorithm
    • 21
    • PDF
    Associative Reinforcement Learning: Functions in k-DNF
    • L. Kaelbling
    • Mathematics, Computer Science
    • Machine Learning
    • 2004
    • 51
    • PDF
    On-line evaluation and prediction using linear functions
    • 12
    Individual sequence prediction—upper bounds and application for complexity
    • 3
    Using Confidence Bounds for Exploitation-Exploration Trade-offs
    • P. Auer
    • Mathematics, Computer Science
    • J. Mach. Learn. Res.
    • 2002
    • 994
    • PDF
    Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • 3,055
    • PDF
    Using upper confidence bounds for online learning
    • P. Auer
    • Computer Science
    • Proceedings 41st Annual Symposium on Foundations of Computer Science
    • 2000
    • 37
    Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
    • 128
    • PDF