• Corpus ID: 236950777

Temporally Abstract Partial Models

@inproceedings{Khetarpal2021TemporallyAP,
  title={Temporally Abstract Partial Models},
  author={Khimya Khetarpal and Zafarali Ahmed and Gheorghe Comanici and Doina Precup},
  booktitle={NeurIPS},
  year={2021}
}
Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement learning, option models (Sutton, Precup & Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a… 

Figures and Tables from this paper

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to having many choices. Hierarchical reinforcement learning

Flexible Option Learning

This work revisits and extends intra-option learning in the context of deep reinforcement learning, in order to enable updating all options consistent with current primitive action choices, without introducing any additional estimates.

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning. The agent uses a bottleneck mechanism over a

References

SHOWING 1-10 OF 50 REFERENCES

Theoretical Results on Reinforcement Learning with Temporally Abstract Options

New Bellman equations that are satisfied for sets of multi-time models are defined, which can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration.

What can I do here? A Theory of Affordances in Reinforcement Learning

A theory of affordances for agents who learn and plan in Markov Decision Processes is developed and an approach to learn affordances is proposed to estimate transition models that are simpler and generalize better.

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

The Option-Critic Architecture

This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals.

Deep Affordance Foresight: Planning Through What Can Be Done in the Future

A new affordance representation is introduced that enables the robot to reason about the longterm effects of actions through modeling what actions are afforded in the future and develops a learning-to-plan method, Deep Affordance Foresight (DAF), that learns partial environment models of affordances of parameterized motor skills through trial-and-error.

Toward Affordance-Aware Planning

This work formalizes the notion of affordances as knowledge added to an MDP that prunes actions in a state and reward general way, which significantly reduces the number of state-action pairs the agent needs to evaluate in order to act optimally.

Goal-Based Action Priors

This work develops a framework for goal and state dependent action priors that can be used to prune away irrelevant actions based on the robot’s current goal, thereby greatly accelerating planning in a variety of complex stochastic environments.

Options of Interest: Temporal Abstraction with Interest Functions

This work provides a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option, and derives a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture.

Shaping Belief States with Generative Environment Models for RL

It is found that predicting multiple steps into the future (overshooting) is critical for stable representations to emerge and a scheme to reduce this computational burden is proposed, allowing us to build agents that are competitive with model-free baselines.

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning. The agent uses a bottleneck mechanism over a