Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

@article{Cohen2021CuriosityKO,
  title={Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent},
  author={Michael K. Cohen and Elliot Catt and Marcus Hutter},
  journal={IEEE Journal on Selected Areas in Information Theory},
  year={2021},
  volume={2},
  pages={665-677}
}
Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner’s policy approaches optimality—where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be “asymptotically optimal” in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either “destroyed” or “incapacitated” with probability 1… Expand

Figures from this paper

References

SHOWING 1-10 OF 46 REFERENCES
A Strongly Asymptotically Optimal Agent in General Environments
TLDR
An algorithm for a policy whose value approaches the optimal value with probability 1 in all computable probabilistic environments, provided the agent has a bounded horizon is presented. Expand
Rationality, optimism and guarantees in general reinforcement learning
TLDR
This article introduces a framework for general reinforcement learning agents based on rationality axioms for a decision function and an hypothesis-generating function designed so as to achieve guarantees on the number errors, and introduces a notion of a class of environments being generated by a set of laws. Expand
Nonparametric General Reinforcement Learning
TLDR
It is proved that Thompson sampling is asymptotically optimal in stochastic environments in the sense that its value converges to the value of the optimal policy, and Thompson sampling achieves sublinear regret in these environments. Expand
Safe Exploration in Markov Decision Processes
TLDR
This paper proposes a general formulation of safety through ergodicity, and shows that imposing safety by restricting attention to the resulting set of guaranteed safe policies is NP-hard, and presents an efficient algorithm for guaranteed safe, but potentially suboptimal, exploration. Expand
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand
A Game-Theoretic Approach to Apprenticeship Learning
TLDR
A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's. Expand
Universal Reinforcement Learning Algorithms: Survey and Experiments
TLDR
A short and accessible survey of the universal Bayesian agent AIXI and a family of related URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments are presented. Expand
Delegative Reinforcement Learning: learning to avoid traps with a little help
Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm toExpand
Safe Exploration of State and Action Spaces in Reinforcement Learning
TLDR
The PI-SRL algorithm is introduced, which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. Expand
Optimality Issues of Universal Greedy Agents with Static Priors
TLDR
It is proved that the current definition of AIXI can sometimes be only suboptimal in a certain sense, and this result generalizes to infinite horizon agents and to any static prior distribution. Expand
...
1
2
3
4
5
...