Corpus ID: 16977738

Active Reinforcement Learning: Observing Rewards at a Cost

@article{Krueger2020ActiveRL,
  title={Active Reinforcement Learning: Observing Rewards at a Cost},
  author={David Krueger and J. Leike and Owain Evans and J. Salvatier},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.06709}
}
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes… Expand
9 Citations
Active Reinforcement Learning with Monte-Carlo Tree Search
  • 5
  • PDF
Active Measure Reinforcement Learning for Observation Cost Minimization
  • PDF
Scalable agent alignment via reward modeling: a research direction
  • 60
  • PDF
Batch Active Preference-Based Learning of Reward Functions
  • 21
  • PDF
Learning to Request Guidance in Emergent Communication
  • PDF
Self-Regulated Interactive Sequence-to-Sequence Learning
  • 4
  • PDF
AI Research Considerations for Human Existential Safety (ARCHES)
  • 4
  • PDF

References

SHOWING 1-10 OF 24 REFERENCES
Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor
  • 16
  • PDF
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
  • 117
  • PDF
A Bayesian Framework for Reinforcement Learning
  • 378
  • Highly Influential
  • PDF
Robust Online Optimization of Reward-Uncertain MDPs
  • 24
  • PDF
Active Reward Learning
  • 73
  • PDF
Discrete Prediction Games with Arbitrary Feedback and Loss
  • 72
  • PDF
Preference-Based Reinforcement Learning: A Preliminary Survey
  • 16
  • PDF
Optimal Learning
  • 92
  • PDF
Interactively shaping agents via human reinforcement: the TAMER framework
  • 309
  • PDF
Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring
  • 14
  • PDF
...
1
2
3
...