# The Importance of Non-Markovianity in Maximum State Entropy Exploration

@article{Mutti2022TheIO, title={The Importance of Non-Markovianity in Maximum State Entropy Exploration}, author={Mirco Mutti and Ric De Santi and Marcello Restelli}, journal={ArXiv}, year={2022}, volume={abs/2202.03060} }

In the maximum state entropy exploration frame-work, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing. Hazan et al. (2019) noted that the class of Markovian stochastic policies is sufﬁcient for the maximum state entropy objective, and exploiting non-Markovianity is generally considered pointless in this setting. In this paper, we argue that non-Markovianity is instead paramount for maximum state…

## 3 Citations

### Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments

- Computer Science
- 2022

This work proposes a novel non-Markovian policy architecture to be pre-trained with the common maximum state entropy objective and showcases significant empirical advantages when compared to state-of-the-art Markovian agents for URL.

### Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency

- Computer Science, Mathematics
- 2022

This work considers the estimation of the transition probabilities of a finite-state finite-control CMC, and develops a minimax sample complexity bounds for nonparametric estimation of these transition probability matrices.

### Challenging Common Assumptions in Convex Reinforcement Learning

- Computer ScienceArXiv
- 2022

The classic Reinforcement Learning (RL) formulation concerns the maximization of a scalar reward function. More recently, convex RL has been introduced to extend the RL formulation to all the…

## References

SHOWING 1-10 OF 44 REFERENCES

### Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

- Computer ScienceAAAI
- 2021

It is argued that the entropy of the state distribution induced by finite-horizon trajectories is a sensible target, and a novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), is presented to learn a policy that maximizes a non-parametric, $k$-nearest neighbors estimate of thestate distribution entropy.

### Provably Efficient Maximum Entropy Exploration

- Computer ScienceICML
- 2019

This work studies a broad class of objectives that are defined solely as functions of the state-visitation frequencies that are induced by how the agent behaves, and provides an efficient algorithm to optimize such intrinsically defined objectives, when given access to a black box planning oracle.

### Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

- Computer ScienceArXiv
- 2019

A no-regret algorithm based on online convex optimization tools and a novel gradient threshold procedure, which carefully controls the switches among actions to handle the subtle trade-off in alternating among different actions for balancing the vectorial outcomes.

### RL for Latent MDPs: Regret Guarantees and a Lower Bound

- Computer ScienceNeurIPS
- 2021

This work considers the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP) and shows that the key link is a notion of separation between the MDP system dynamics, providing an efficient algorithm with local guarantee.

### k-Means Maximum Entropy Exploration

- Computer ScienceArXiv
- 2022

This work introduces an artiﬁcial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution that is competitive on benchmarks for exploration in high-dimensional, continuous spaces, especially on tasks where reinforcement learning algorithms are unable to create rewards.

### An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

- Computer ScienceAAAI
- 2020

This paper proposes a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy.

### Non-Markovian policies occupancy measures

- MathematicsArXiv
- 2022

A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent’s actions are chosen from a memoryless probability distribution, conditioned only on its current…

### Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

- Computer ScienceAAAI
- 2021

A reward-free RL framework is considered that completely separates exploration from exploitation and brings new challenges for exploration algorithms, which results in superior policies for arbitrary reward functions in the planning phase.

### Geometric Entropic Exploration

- Computer ScienceArXiv
- 2021

Geometric Entropy Maximisation (GEM) is introduced, a new algorithm that maximises the geometry-aware Shannon entropy of state-visits in both discrete and continuous domains and is shown to be efficient in solving complex Reinforcement Learning tasks with sparse rewards.

### Active Exploration in Markov Decision Processes

- Computer ScienceAISTATS
- 2019

A novel learning algorithm is introduced to solve the active exploration problem in Markov decision processes showing that active exploration in MDPs may be significantly more difficult than in MAB.