• Corpus ID: 52123271

Approximate Exploration through State Abstraction

@article{Taga2018ApproximateET,
  title={Approximate Exploration through State Abstraction},
  author={Adrien Ali Ta{\"i}ga and Aaron C. Courville and Marc G. Bellemare},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.09819}
}
Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an… 
Abstract Value Iteration for Hierarchical Reinforcement Learning
TLDR
This work proposes a novel hierarchical reinforcement learning framework for control with continuous state and action spaces and proposes two algorithms for planning in the ADP, a practical one that interweaves planning at the abstract level and learning at the concrete level.
Optimistic Exploration even with a Pessimistic Initialisation
TLDR
This work proposes a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network and shows that OPIQ outperforms non-optimistic DQN variants that utilise a pseudocount-based intrinsic motivation in hard exploration tasks, and that it predicts optimistic estimates for novel state-action pairs.
State Abstraction as Compression in Apprenticeship Learning
TLDR
This work offers the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning and builds on Rate-Distortion theory, the classic Blahut-Arimoto algorithm, and the Information Bottleneck method to develop an algorithm.
POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
TLDR
POLicy ITeration with EXpert advice is presented, a variant of policy iteration where each policy is a Boltzmann distribution over the sum of action-value function estimates of the previous policies, and the viability of POLITEX beyond linear function approximation is confirmed.
Concepts in Bounded Rationality: Perspectives from Reinforcement Learning
of “Concepts in Bounded Rationality: Perspectives from Reinforcement Learning”, by David Abel, A.M., Brown University, May 2019. In this thesis, I explore the relevance of computational reinforcement
The value of abstraction
TLDR
Three ways in which abstractions can guide learning are discussed: domain structure and representational simplicity, which facilitate efficient learning by guiding exploration and generalization in RL.
Abstract Value Iteration for Hierarchical Deep Reinforcement Learning
Value Iteration for Hierarchical Deep Reinforcement Learning Kishor Jothimurugan University of Pennsylvania Osbert Bastani University of Pennsylvania Rajeev Alur University of Pennsylvania
Provably Safe PAC-MDP Exploration Using Analogies
TLDR
This work proposes Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics, which exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.

References

SHOWING 1-10 OF 44 REFERENCES
Model-based reinforcement learning with nearly tight exploration complexity bounds
TLDR
Mormax, a modified version of the Rmax algorithm, is shown to need to make at most O(N log N) exploratory steps, which matches the lower bound up to logarithmic factors, as well as the upper bound of the state-of-the-art model-free algorithm, while the new bound improves the dependence on other problem parameters.
On the sample complexity of reinforcement learning.
TLDR
Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.
Count-Based Exploration with Neural Density Models
Bellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning. This pseudo-count was used
Near-Bayesian exploration in polynomial time
TLDR
A simple algorithm is presented, and it is proved that with high probability it is able to perform ε-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps.
Near Optimal Behavior via Approximate State Abstraction
TLDR
This work investigates approximate state abstractions, which treat nearly-identical situations as equivalent and demonstrates that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
TLDR
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.
Extreme State Aggregation beyond MDPs
TLDR
This work considerably generalizes existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states.
Parameter Space Noise for Exploration
TLDR
This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
Exploration-Exploitation in MDPs with Options
TLDR
An upper and lower bound on the regret of a variant of UCRL using options is derived and simple scenarios are illustrated in whichThe regret of learning with options can be provably much smaller than the regret suffered when learning with primitive actions.
Exploration by Random Network Distillation
TLDR
An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.
...
1
2
3
4
5
...