• Corpus ID: 24397111

Symmetry Learning for Function Approximation in Reinforcement Learning

  title={Symmetry Learning for Function Approximation in Reinforcement Learning},
  author={Anuj Mahajan and Theja Tulabandhula},
In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for… 

Figures and Tables from this paper

Exploiting Abstract Symmetries in Reinforcement Learning for Complex Environments
This work presents a novel concept that exploits abstract spatial symmetry in complex environments for extending the skills of naïvely trained agents in local abstractions of the environment with the concept of EASE (Exploitation of Abstract Symmetry of Environments).
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
This work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL and formulation of a general framework, called Invariant Transform Experience Replay is presented.
Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation
This work proposes two novel data augmentation techniques for DRL that exploits reflectional symmetries and lax goal definitions in order to reuse more efficiently observed data.
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
This work proposes a novel tensorised formulation of the Bellman equation, which gives rise to the method Tesseract, which utilises the view of Q-function seen as a tensor where the modes correspond to action spaces of different agents.
Invariant Transform Experience Replay
Invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL and are presented with two techniques, Kaleidoscope Experience Replay and Goal-augmented Experience Replay, which take advantage of lax goal definitions.
Reinforcement Learning in Factored Action Spaces using Tensor Decompositions
This work uses cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.
MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
This paper introduces MDP homomorphic networks for deep reinforcement learning and introduces an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done.
MAVEN: Multi-Agent Variational Exploration
A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks.
Quarto as a Reinforcement Learning problem
Reinforcement Learning has proven itself through a recent history of superhuman level performances in various tasks. Board games are of particular interest because they can be simulated completely,
VIREL: A Variational Inference Framework for Reinforcement Learning
VIREL is proposed, a novel, theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP and it is shown that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step.


Universal Value Function Approximators
An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Improving reinforcement learning by using sequence trees
This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning
Learning Options in Reinforcement Learning
This paper empirically explores a simple approach to creating options based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals, and proposes a greedy algorithm for identifying subgoal counts based on state visitation counts.
Representation Learning: A Review and New Perspectives
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Macro Actions in Reinforcement Learning
This work compares multiple algorithms that exercise the macro-actions heuristic on a task to learn a defense agent in Half Field Offense problem (HFO) and proposes and compares a few other simple techniques that align with the heuristic and improve over DI-SARSA.
Learning Macro-Actions in Reinforcement Learning
A method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded.
Guided Policy Search
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.