# Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

@article{Cohen2021CuriosityKO, title={Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent}, author={Michael K. Cohen and Elliot Catt and Marcus Hutter}, journal={IEEE Journal on Selected Areas in Information Theory}, year={2021}, volume={2}, pages={665-677} }

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner’s policy approaches optimality—where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be “asymptotically optimal” in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either “destroyed” or “incapacitated” with probability 1… Expand

#### References

SHOWING 1-10 OF 46 REFERENCES

A Strongly Asymptotically Optimal Agent in General Environments

- Computer Science, Mathematics
- IJCAI
- 2019

An algorithm for a policy whose value approaches the optimal value with probability 1 in all computable probabilistic environments, provided the agent has a bounded horizon is presented. Expand

Rationality, optimism and guarantees in general reinforcement learning

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2015

This article introduces a framework for general reinforcement learning agents based on rationality axioms for a decision function and an hypothesis-generating function designed so as to achieve guarantees on the number errors, and introduces a notion of a class of environments being generated by a set of laws. Expand

Nonparametric General Reinforcement Learning

- Computer Science
- ArXiv
- 2016

It is proved that Thompson sampling is asymptotically optimal in stochastic environments in the sense that its value converges to the value of the optimal policy, and Thompson sampling achieves sublinear regret in these environments. Expand

Safe Exploration in Markov Decision Processes

- Computer Science, Mathematics
- ICML
- 2012

This paper proposes a general formulation of safety through ergodicity, and shows that imposing safety by restricting attention to the resulting set of guaranteed safe policies is NP-hard, and presents an efficient algorithm for guaranteed safe, but potentially suboptimal, exploration. Expand

Apprenticeship learning via inverse reinforcement learning

- Computer Science
- ICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand

A Game-Theoretic Approach to Apprenticeship Learning

- Computer Science
- NIPS
- 2007

A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's. Expand

Universal Reinforcement Learning Algorithms: Survey and Experiments

- Computer Science
- IJCAI
- 2017

A short and accessible survey of the universal Bayesian agent AIXI and a family of related URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments are presented. Expand

Delegative Reinforcement Learning: learning to avoid traps with a little help

- Mathematics, Computer Science
- ArXiv
- 2019

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to… Expand

Safe Exploration of State and Action Spaces in Reinforcement Learning

- Computer Science
- J. Artif. Intell. Res.
- 2012

The PI-SRL algorithm is introduced, which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. Expand

Optimality Issues of Universal Greedy Agents with Static Priors

- Mathematics, Computer Science
- ALT
- 2010

It is proved that the current definition of AIXI can sometimes be only suboptimal in a certain sense, and this result generalizes to infinite horizon agents and to any static prior distribution. Expand